Fast dimension reduction for document classification based on imprecise spectrum analysis

Hu Guan, Jingyu Zhou, Bin Xiao, Minyi Guo, Tao Yang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

18 Citations (Scopus)

Abstract

Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD) is an effective dimension reduction method for document classification and other information analysis tasks. The computational overhead of SVD is known to be a bottleneck in dealing with large data sets, and faster dimension reduction with competitive accuracy is desired in such a setting. This paper presents Imprecise Spectrum Analysis (ISA) to carry out fast dimension reduction for document classification. ISA follows the one-sided Jacobi method for computing SVD and simplifies its intensive orthogonality computation. It uses a representative matrix composed of top-k column vectors derived from the original feature vector space and reduces the dimension of a feature vector by computing its product with this representative matrix. The paper provides an analysis to show the approximation error and the rationale behind such a dimension reduction method. To further improve classification accuracy, this paper also presents a feature selection method in building the initial feature matrix and augments the representative matrix by including centroid vectors. Our extensive experimental results show that ISA is fast in handling large term-document feature matrices while delivering better or competitive classification accuracy for the tested benchmarks compared to LSI with SVD.
Original languageEnglish
Pages (from-to)147-162
Number of pages16
JournalInformation Sciences
Volume222
DOIs
Publication statusPublished - 10 Feb 2013

Keywords

  • Dimension reduction
  • Feature selection
  • Imprecise spectrum analysis
  • Latent Semantic Indexing
  • Singular Value Decomposition

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this