Cluster feature selection in high-dimensional linear models

Bingqing Lin, Zhen Pang, Qihua Wang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

This paper concerns with variable screening when highly correlated variables exist in high-dimensional linear models. We propose a novel cluster feature selection (CFS) procedure based on the elastic net and linear correlation variable screening to enjoy the benefits of the two methods. When calculating the correlation between the predictor and the response, we consider highly correlated groups of predictors instead of the individual ones. This is in contrast to the usual linear correlation variable screening. Within each correlated group, we apply the elastic net to select variables and estimate their parameters. This avoids the drawback of mistakenly eliminating true relevant variables when they are highly correlated like LASSO [R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B 58 (1996) 268-288] does. After applying the CFS procedure, the maximum absolute correlation coefficient between clusters becomes smaller and any common model selection methods like sure independence screening (SIS) [J. Fan and J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B 70 (2008) 849-911] or LASSO can be applied to improve the results. Extensive numerical examples including pure simulation examples and semi-real examples are conducted to show the good performances of our procedure.
Original languageEnglish
Article number1750015
JournalRandom Matrices: Theory and Application
Volume7
Issue number1
DOIs
Publication statusPublished - 1 Jan 2018

Keywords

  • elastic net
  • SIS
  • variable screening
  • Variable selection

ASJC Scopus subject areas

  • Statistics, Probability and Uncertainty
  • Algebra and Number Theory
  • Discrete Mathematics and Combinatorics
  • Statistics and Probability

Cite this