网站地图 | 联系我们  
首页 中心概况 新闻动态 科研进展 交流合作 人才培养 研究队伍 人才招聘 政策规章 数学交叉科学传播
学术报告
现在位置:首页 > 学术报告

Interaction-based learning and prediction in Big Data
【2015.1.21 9:30am, N210】

【打印】【关闭】

 2015-1-20 

  Colloquia & Seminars 

  Speaker

Speaker: Prof. Shaw-Hwa Lo,Columbia University  

  Title

Interaction-based learning and prediction in Big Data  

  Time

2015.1.21 9:30-10:30am

  Venue

N210

  Abstract

We consider a computer intensive approach (Partition Retention (PR, 09) ), based on an earlier method (Lo and Zheng (2002) for detecting which, of many potential explanatory variables, have an influence on a dependent variable Y. This approach is suited to detect influential variables in groups, where causal effects depend on the confluence of values of several variables. It has the advantage of avoiding a difficult direct analysis, involving possibly thousands of variables, guided by a measure of influence I. We next apply PR to more challenging real data applications, typically involving complex and extremely high dimensional data. The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance using external biological knowledge. We demonstrate that (1) the classification error rates can be significantly reduced by considering interactions; (2) incorporating interaction information into data analysis can be very rewarding in generating novel scientific findings. Heuristic explanations why and when the proposed methods may lead to such a dramatic (classification/ predictive) gain are followed.If time permits, we point out a puzzle that highly predictive variables do not necessarily appear as highly significant, thus evading the researcher using significance-based methods. If prediction is the goal, we must lay aside significance as the only selection standard.    

  Affiliation

 

欢迎访问国家数学与交叉科学中心 
地址:北京海淀区中关村东路55号 邮编:100190 电话: 86-10-62613242 Fax: 86-10-62616840 邮箱: ncmis@amss.ac.cn