Abstract
We propose a Random Splitting Model Averaging procedure, RSMA, to achieve stable predictions in high-dimensional linear models. The idea is to use split training data to construct and estimate candidate models and use test data to form a second-level data. The second-level data is used to estimate optimal weights for candidate models by quadratic optimization under non-negative constraints. This procedure has three appealing features: (1) RSMA avoids model overfitting, as a result, gives improved prediction accuracy. (2) By adaptively choosing optimal weights, we obtain more stable predictions via averaging over several candidate models. (3) Based on RSMA, a weighted importance index is proposed to rank the predictors to discriminate relevant predictors from irrelevant ones. Simulation studies and a real data analysis demonstrate that RSMA procedure has excellent predictive performance and the associated weighted importance index could well rank the predictors.
Original language | English |
---|---|
Pages (from-to) | 1401-1412 |
Number of pages | 12 |
Journal | Statistics and Computing |
Volume | 27 |
Issue number | 5 |
DOIs | |
Publication status | Published - 1 Sept 2017 |
Keywords
- Model averaging
- Penalized regression
- Screening
- Variable selection
ASJC Scopus subject areas
- Theoretical Computer Science
- Statistics and Probability
- Statistics, Probability and Uncertainty
- Computational Theory and Mathematics