By removing the irrelevant and redundant features, feature selection aims to find a compact representation of the original feature with good generalization ability. With the prevalence of unlabeled data, unsupervised feature selection has shown to be effective in alleviating the curse of dimensionality, and is essential for comprehensive analysis and understanding of myriads of unlabeled high dimensional data. Motivated by the success of low-rank representation in subspace clustering, we propose a regularized self-representation (RSR) model for unsupervised feature selection, where each feature can be represented as the linear combination of its relevant features. By using L2,1-norm to characterize the representation coefficient matrix and the representation residual matrix, RSR is effective to select representative features and ensure the robustness to outliers. If a feature is important, then it will participate in the representation of most of other features, leading to a significant row of representation coefficients, and vice versa. Experimental analysis on synthetic and real-world data demonstrates that the proposed method can effectively identify the representative features, outperforming many state-of-the-art unsupervised feature selection methods in terms of clustering accuracy, redundancy reduction and classification accuracy.
- Group sparsity
- Sparse representation
- Unsupervised feature selection
ASJC Scopus subject areas
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence