Accommodating missingness in environmental measurements in gene-environment interaction analysis

Mengyun Wu, Yangguang Zang, Sanguo Zhang, Jian Huang, Shuangge Ma

Research output: Journal article publicationJournal articleAcademic researchpeer-review

6 Citations (Scopus)

Abstract

For the prognosis of complex diseases, beyond the main effects of genetic (G) and environmental (E) factors, gene-environment (G-E) interactions also play an important role. Many approaches have been developed for detecting important G-E interactions, most of which assume that measurements are complete. In practical data analysis, missingness in E measurements is not uncommon, and failing to properly accommodate such missingness leads to biased estimation and false marker identification. In this study, we conduct G-E interaction analysis with prognosis data under an accelerated failure time (AFT) model. To accommodate missingness in E measurements, we adopt a nonparametric kernel-based data augmentation approach. With a well-designed weighting scheme, a nice “byproduct” is that the proposed approach enjoys a certain robustness property. A penalization approach, which respects the “main effects, interactions” hierarchy, is adopted for selection (of important interactions and main effects) and regularized estimation. The proposed approach has sound interpretations and a solid statistical basis. It outperforms multiple alternatives in simulation. The analysis of TCGA data on lung cancer and melanoma leads to interesting findings and models with superior prediction.

Original languageEnglish
Pages (from-to)523-554
Number of pages32
JournalGenetic Epidemiology
Volume41
Issue number6
DOIs
Publication statusPublished - 1 Sep 2017
Externally publishedYes

Keywords

  • data augmentation
  • G-E interaction
  • missing data
  • penalized estimation
  • prognosis

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this