Abstract
For the prognosis of complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many approaches have been developed for detecting important G-E interactions, most of which assume that measurements are complete. In practical data analysis, missingness in E measurements is not uncommon, and failing to properly accommodate such missingness leads to biased estimation and false marker identification. In this study, we conduct G-E interaction analysis with prognosis data under an accelerated failure time (AFT) model. To accommodate missingness in E measurements, we adopt a nonparametric kernel-based data augmentation approach. With a well-designed weighting scheme, a nice “byproduct” is that the proposed approach enjoys a certain robustness property. A penalization approach, which respects the “main effects, interactions” hierarchy, is adopted for selection (of important interactions and main effects) and regularized estimation. The proposed approach has sound interpretations and a solid statistical basis. It outperforms multiple alternatives in simulation. The analysis of TCGA data on lung cancer and melanoma leads to interesting findings and models with superior prediction.
Original language | English |
---|---|
Pages (from-to) | 523-554 |
Number of pages | 32 |
Journal | Genetic Epidemiology |
Volume | 41 |
Issue number | 6 |
DOIs | |
Publication status | Published - 1 Sept 2017 |
Externally published | Yes |
Keywords
- data augmentation
- G-E interaction
- missing data
- penalized estimation
- prognosis
ASJC Scopus subject areas
- Epidemiology
- Genetics(clinical)