Variable screening is of fundamental importance in linear regression models when the number of predictors far exceeds the number of observations. Multicollinearity is a common phenomenon in high-dimensional settings, in which two or more predictor variables are highly correlated, leading to the notorious difficulty for high-dimensional variable screening. Sure independence screening (SIS) procedure can greatly reduce the dimensionality, but it may break down when the predictors are highly correlated. By combing the factor modelling with SIS, the profiled independence screening (PIS) approach was proposed. However, under a spiked population model, the profiled predictors could not be guaranteed to be uncorrelated and PIS may therefore be misleading. Instead of assuming either the predictors are uncorrelated as in SIS or the profiled predictors are uncorrelated as in PIS, a more general and challenging scenario is considered in which the predictors can be highly correlated. A so-called preconditioned PIS (PPIS) method is proposed that produces asymptotically uncorrelated profiled predictors and thus leads to consistent model selection results under a spiked population model. Compared with PIS, the proposed method could handle the complex multicollinearity case, such as a spiked population model with a slow spectrum decay of population covariance matrix, while keeping the calculation simple. The promising performance of the proposed PPIS method will be illustrated via extensive simulation studies and two real examples.
- high dimensionality
- spiked population model
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty