Abstract
Big data present new theoretical and computational challenges as well as tremendous opportunities in many fields. In health care research, we develop a novel divide-and-conquer (DAC) approach to deal with massive and right-censored data under the accelerated failure time model, where the sample size is extraordinarily large and the dimension of predictors is large but smaller than the sample size. Specifically, we construct a penalized loss function by approximating the weighted least squares loss function by combining estimation results without penalization from all subsets. The resulting adaptive LASSO penalized DAC estimator enjoys the oracle property. Simulation studies demonstrate that the proposed DAC procedure performs well and also reduces the computation time with satisfactory performance compared with estimation results using the full data. Our proposed DAC approach is applied to a massive dataset from the Chinese Longitudinal Healthy Longevity Survey.
Original language | English |
---|---|
Pages (from-to) | 400-419 |
Number of pages | 20 |
Journal | Canadian Journal of Statistics |
Volume | 51 |
Issue number | 2 |
DOIs | |
Publication status | Published - Jun 2023 |
Keywords
- Accelerated failure time model
- adaptive LASSO
- divide and conquer
- oracle property
- survival data
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty