A weighted word embedding model for text classification

Haopeng Ren, Ze Quan Zeng, Yi Cai, Qing Du, Qing Li, Haoran Xie

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

7 Citations (Scopus)

Abstract

Neural bag-of-words models (NBOW) have achieved great success in text classification. They compute a sentence or document representation by mathematical operations such as simply adding and averaging over the word embedding of each sequence element. Thus, NBOW models have few parameters and require low computation cost. Intuitively, considering the important degree of each word and the word-order information for text classification are beneficial to obtain informative sentence or document representation. However, NBOW models hardly consider the above two factors when generating a sentence or document representation. Meanwhile, term weighting schemes assigning relatively high weight values to important words have exhibited successful performance in traditional bag-of-words models. However, it is still seldom used in neural models. In addition, n-grams capture word-order information in short context. In this paper, we propose a model called weighted word embedding model (WWEM). It is a variant of NBOW model introducing term weighting schemes and n-grams. Our model generates informative sentence or document representation considering the important degree of words and the word-order information. We compare our proposed model with other popular neural models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 24th International Conference, DASFAA 2019, Proceedings
EditorsGuoliang Li, Jun Yang, Joao Gama, Juggapong Natwichai, Yongxin Tong
PublisherSpringer-Verlag
Pages419-434
Number of pages16
ISBN (Print)9783030185756
DOIs
Publication statusPublished - 1 Jan 2019
Event24th International Conference on Database Systems for Advanced Applications, DASFAA 2019 - Chiang Mai, Thailand
Duration: 22 Apr 201925 Apr 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11446 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Database Systems for Advanced Applications, DASFAA 2019
Country/TerritoryThailand
CityChiang Mai
Period22/04/1925/04/19

Keywords

  • N-grams
  • Neural bag-of-words models
  • Term weighting schemes
  • Text classification

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this