Abstract
We consider the problem of simultaneous variable selection and estimation in partially linear models with a divergent number of covariates in the linear part, under the assumption that the vector of regression coefficients is sparse. We apply the SCAD penalty to achieve sparsity in the linear part and use polynomial splines to estimate the nonparametric component. Under reasonable conditions, it is shown that consistency in terms of variable selection and estimation can be achieved simultaneously for the linear and nonparametric components. Furthermore, the SCAD-penalized estimators of the nonzero coefficients are shown to have the asymptotic oracle property, in the sense that it is asymptotically normal with the same means and covariances that they would have if the zero coefficients were known in advance. The finite sample behavior of the SCAD-penalized estimators is evaluated with simulation and illustrated with a data set.
Original language | English |
---|---|
Pages (from-to) | 673-696 |
Number of pages | 24 |
Journal | Annals of Statistics |
Volume | 37 |
Issue number | 2 |
DOIs | |
Publication status | Published - 1 Apr 2009 |
Externally published | Yes |
Keywords
- Asymptotic normality
- High-dimensional data
- Oracle property
- Penalized estimation
- Semiparametric models
- Variable selection
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty