High-throughput gene profiling studies have been extensively conducted, searching for markers associated with cancer development and progression. In this study, we analyse cancer prognosis studies with right censored survival responses. With gene expression data, we adopt the weighted gene co-expression network analysis (WGCNA) to describe the interplay among genes. In network analysis, nodes represent genes. There are subsets of nodes, called modules, which are tightly connected to each other. Genes within the same modules tend to have co-regulated biological functions. For cancer prognosis data with gene expression measurements, our goal is to identify cancer markers, while properly accounting for the network module structure. A two-step sparse boosting approach, called Network Sparse Boosting (NSBoost), is proposed for marker selection. In the first step, for each module separately, we use a sparse boosting approach for within-module marker selection and construct module-level Asuper markers. In the second step, we use the super markers to represent the effects of all genes within the same modules and conduct module-level selection using a sparse boosting approach. Simulation study shows that NSBoost can more accurately identify cancer-associated genes and modules than alternatives. In the analysis of breast cancer and lymphoma prognosis studies, NSBoost identifies genes with important biological implications. It outperforms alternatives including the boosting and penalization approaches by identifying a smaller number of genes/modules and/or having better prediction performance.
ASJC Scopus subject areas