Abstract |
We investigate the scenario of selecting variables in both the group level and within-group level simultaneously, in the sense that at most one variable will be selected in each group. More specifically, we consider this issue in the context of logistic regression under case-control design, which is one of the most common statistical framework for genetic association studies. The key aim of genetic association studies is to detect which mode of inheritance for each genetic variant (mainly single nucleotide polymorphism, SNP) confers higher risk of human complex diseases, when multiple SNPs have been verified to be deleterious based on experiments or some biological analysis tools. Moreover, determining the genetic inheritance mode of each SNP can help investigators further understand the occurrence and development mechanism of human diseases. The existing procedures such as group bridge, group MCP and sparse group LASSO do not aim at this situation. A new method named by the cross product MCP is proposed here. We derive the oracle properties of the regression estimators of coefficients, and design a specific coordinate decent algorithm. Simulation studies and application to HLA-DRB1 gene for rheumatoid arthritis show that the proposed approach works fairly well. |