Abstract |
Based on two EHR Big Data sets with sample sizes n=10 and 50 million respectively, we derived different types of disease-disease networks using the longitudinal information. We establish both short-term and long-term directed networks as well as the simultaneously-occurring undirected network of 1660 PheWAS disease groups. Among 2,753,940 possible disease pairs, we identified 646,969 for long-term and 10,587 for short-term significant pairs, respectively, which were observed in at least five patients and had relative risk (RR) > 1 with significance at 0.05 level after Bonferroni corrections. Among 1,376,970 possible disease pairs of simultaneous occurrence, we identified 18,137 which were observed in at least five patients and had RR > 1 with significance at 0.05 level after Bonferroni corrections. For the short-term network, the top out-degree diseases are more likely pregnancy and kidney related diseases; while for the long-term network, the top out-degree diseases are more likely chronical diseases. More clinical implications from these findings will be discussed. This project requires multidisciplinary technologies, including medical record databases, ontology, high-performance computing, computational modeling, large-scale optimization, machine learning and statistics. I will also discuss how to form a multidisciplinary team to collaborate on a Big Data project, which has potential to have a high impact in many scientific fields and people’s daily life. |