A Framework of Feature Selection Methods for Text Categorization

Shoushan Li, Rui Xia, Chengqing Zong, Chu Ren Huang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

89 Citations (Scopus)

Abstract

In text categorization, feature selection (FS) is a strategy that aims at making text classifiers more efficient and accurate. However, when dealing with a new task, it is still difficult to quickly select a suitable one from various FS methods provided by many previous studies. In this paper, we propose a theoretic framework of FS methods based on two basic measurements: frequency measurement and ratio measurement. Then six popular FS methods are in detail discussed under this framework. Moreover, with the guidance of our theoretical analysis, we propose a novel method called weighed frequency and odds (WFO) that combines the two measurements with trained weights. The experimental results on data sets from both topic-based and sentiment classification tasks show that this new method is robust across different tasks and numbers of selected features.

Original languageEnglish
Title of host publicationACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.
EditorsKeh-Yih Su, Jian Su, Janyce Wiebe, Haizhou Li
PublisherAssociation for Computational Linguistics (ACL)
Pages692-700
Number of pages9
ISBN (Print)9781617382581
Publication statusPublished - 2009
EventJoint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009 - Suntec, Singapore
Duration: 2 Aug 20097 Aug 2009

Publication series

NameACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf.

Conference

ConferenceJoint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009
Country/TerritorySingapore
CitySuntec
Period2/08/097/08/09

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'A Framework of Feature Selection Methods for Text Categorization'. Together they form a unique fingerprint.

Cite this