TY - GEN
T1 - Integrative Analysis of Multiple Cancer Prognosis Datasets Under the Heterogeneity Model
AU - Liu, Jin
AU - Huang, Jian
AU - Ma, Shuangge
PY - 2013/10/28
Y1 - 2013/10/28
N2 - In cancer research, genomic studies have been extensively conducted, searching for markers associated with prognosis. Because of the "large d, small n" characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Integrative analysis simultaneously analyzes multiple datasets and can be more effective than the analysis of single datasets and classic meta-analysis. In many existing integrative analyses, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. In practice, datasets may have been generated in studies that differ in patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. Here we explore the heterogeneity model, which assumes that different datasets may have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) models to describe survival. A weighted least squares approach is adopted for estimation. For marker selection, penalization-based methods are examined. These methods have intuitive formulations and can be computed using effective group coordinate descent algorithms. Analysis of three lung cancer prognosis datasets with gene expression measurements demonstrates the merit of heterogeneity model and proposed methods.
AB - In cancer research, genomic studies have been extensively conducted, searching for markers associated with prognosis. Because of the "large d, small n" characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Integrative analysis simultaneously analyzes multiple datasets and can be more effective than the analysis of single datasets and classic meta-analysis. In many existing integrative analyses, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. In practice, datasets may have been generated in studies that differ in patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. Here we explore the heterogeneity model, which assumes that different datasets may have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) models to describe survival. A weighted least squares approach is adopted for estimation. For marker selection, penalization-based methods are examined. These methods have intuitive formulations and can be computed using effective group coordinate descent algorithms. Analysis of three lung cancer prognosis datasets with gene expression measurements demonstrates the merit of heterogeneity model and proposed methods.
UR - http://www.scopus.com/inward/record.url?scp=84886037909&partnerID=8YFLogxK
U2 - 10.1007/978-1-4614-7846-1_21
DO - 10.1007/978-1-4614-7846-1_21
M3 - Conference article published in proceeding or book
AN - SCOPUS:84886037909
SN - 9781461478454
T3 - Springer Proceedings in Mathematics and Statistics
SP - 257
EP - 269
BT - Topics in Applied Statistics - 2012 Symposium of the International Chinese Statistical Association
T2 - 21st Symposium of the International Chinese Statistical Association, ICSA 2012
Y2 - 23 June 2012 through 26 June 2012
ER -