Integrative analysis of multiple cancer genomic datasets under the heterogeneity model

Jin Liu, Jian Huang, Shuangge Ma

Research output: Journal article publicationJournal articleAcademic researchpeer-review

12 Citations (Scopus)

Abstract

In the analysis of cancer studies with high-dimensional genomic measurements, integrative analysis provides an effective way of pooling information across multiple heterogeneous datasets. The genomic basis of multiple independent datasets, which can be characterized by the sets of genomic markers, can be described using the homogeneity model or heterogeneity model. Under the homogeneity model, all datasets share the same set of markers associated with responses. In contrast, under the heterogeneity model, different studies have overlapping but possibly different sets of markers. The heterogeneity model contains the homogeneity model as a special case and can be much more flexible. Marker selection under the heterogeneity model calls for bi-level selection to determine whether a covariate is associated with response in any study at all as well as in which studies it is associated with responses. In this study, we consider two minimax concave penalty-based penalization approaches for marker selection under the heterogeneity model. For each approach, we describe its rationale and an effective computational algorithm. We conduct simulations to investigate their performance and compare with the existing alternatives. We also apply the proposed approaches to the analysis of gene expression data on multiple cancers.

Original languageEnglish
Pages (from-to)3509-3521
Number of pages13
JournalStatistics in Medicine
Volume32
Issue number20
DOIs
Publication statusPublished - 10 Sep 2013
Externally publishedYes

Keywords

  • Heterogeneity model
  • Integrative analysis
  • Marker selection

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this