High-dimensional variable screening under multicollinearity

Naifei Zhao, Qingsong Xu, Man-Lai Tang, Binyan Jiang, Ziqi Chen, Hong Wang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

Variable screening is of fundamental importance in linear regression models when the number of predictors far exceeds the number of observations. Multicollinearity is a common phenomenon in high-dimensional settings, in which two or more predictor variables are highly correlated, leading to the notorious difficulty for high-dimensional variable screening. Sure independence screening (SIS) procedure can greatly reduce the dimensionality, but it may break down when the predictors are highly correlated. By combing the factor modelling with SIS, the profiled independence screening (PIS) approach was proposed. However, under a spiked population model, the profiled predictors could not be guaranteed to be uncorrelated and PIS may therefore be misleading. Instead of assuming either the predictors are uncorrelated as in SIS or the profiled predictors are uncorrelated as in PIS, a more general and challenging scenario is considered in which the predictors can be highly correlated. A so-called preconditioned PIS (PPIS) method is proposed that produces asymptotically uncorrelated profiled predictors and thus leads to consistent model selection results under a spiked population model. Compared with PIS, the proposed method could handle the complex multicollinearity case, such as a spiked population model with a slow spectrum decay of population covariance matrix, while keeping the calculation simple. The promising performance of the proposed PPIS method will be illustrated via extensive simulation studies and two real examples.
Original languageEnglish
Article numbere272
Pages (from-to)1-11
Number of pages11
JournalStat
Volume9
Issue number1
DOIs
Publication statusPublished - 26 Jan 2020

Cite this