Build an end-to-end scalable and interpretable data science ecosystem by integrating statistics, ML, and genomics and health sciences 【2024.07.08 14:00, N219】 |
【大 中 小】【打印】【关闭】 |
2024-7-1 Colloquia Seminars Speaker | 林希虹教授(美国科学院院士), 美国哈佛大学 | Title | Build an end-to-end scalable and interpretable data science ecosystem by integrating statistics, ML, and genomics and health sciences | Time | 7月8日14:00-15:30 | Venue | N219 | Abstract | The data science ecosystem encompasses data fairness, statistical, ML methods and tools, interpretable data analysis, and trustworthy decision-making. Rapid advancements in ML have revolutionized data utilization and enabled machines to learn from data more effectively. Statistics, as the science of learning from data while accounting for uncertainty, plays a pivotal role in addressing complex real-world problems and facilitating trustworthy decision-making. In this talk, I will discuss the challenges and opportunities involved in building an end-to-end scalable and interpretable data science ecosystem that integrates statistics, ML, and genomics and health science. I will illustrate key points using the analysis of large scale whole genome sequencing data, electronic health records and biobanks by discussing a few scalable and interpretable statistical and ML methods, tools and data science resources for gene mapping and genetic risk prediction. Examples of large WGS studies and biobanks include the Trans-Omics Precision Medicine Program (TOPMed), TOPMed, UK Biobank and All of Us. These studies have collectively sequenced over a million genomes and thousands of diseases and traits from electronic health records. | Affiliation | 林希虹教授:美国科学院院士,美国国家医学院院士,哈佛大学公共卫生学院生物统计学系终身教授和前系主任,数量基因研究部主任,和统计系终身教授,她获2002年美国公共卫生学会年杰出健康统计学家的Spiegelman奖,2006年统计学界最高奖“考普斯会长奖”,以及2015和 2022年美国国家癌症研究院杰出研究员奖,2022美国国家统计研究院交叉研究Sacks奖,和2022年哈佛大学统计科学Zelen领导力奖。她主要从事海量基因和健康数据,流行病数据的统计和机器学习方法的研究及应用。她曾任考普斯委员会主席,和Biometrics和 Statistics in Bioscience杂志的主编。 | |
|