Abstract
In breast cancer research, it is of great interest to identify genomic markers associated with prognosis. Multiple gene profiling studies have been conducted for such a purpose. Genomic markers identified from the analysis of single datasets often do not have satisfactory reproducibility. Among the multiple possible reasons, the most important one is the small sample sizes of individual studies. A cost-effective solution is to pool data from multiple comparable studies and conduct integrative analysis. In this study, we collect four breast cancer prognosis studies with gene expression measurements. We describe the relationship between prognosis and gene expressions using the accelerated failure time (AFT) models. We adopt a 2-norm group bridge penalization approach for marker identification. This integrative analysis approach can effectively identify markers with consistent effects across multiple datasets and naturally accommodate the heterogeneity among studies. Statistical and simulation studies demonstrate satisfactory performance of this approach. Breast cancer prognosis markers identified using this approach have sound biological implications and satisfactory prediction performance.
Original language | English |
---|---|
Pages (from-to) | 2718-2728 |
Number of pages | 11 |
Journal | Computational Statistics and Data Analysis |
Volume | 56 |
Issue number | 9 |
DOIs | |
Publication status | Published - 1 Sept 2012 |
Externally published | Yes |
Keywords
- 2-norm group bridge
- Breast cancer prognosis
- Gene expression
- Integrative analysis
- Marker identification
ASJC Scopus subject areas
- Statistics and Probability
- Computational Theory and Mathematics
- Computational Mathematics
- Applied Mathematics