ITWF: A framework to apply term weighting schemes in topic model

Kai Yang, Yi Cai, Ho fung Leung, Raymond Y.K. Lau, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Topic models like Latent Dirichlet Allocation (LDA)and its variants is a type of statistical model for discovering latent topics. However, as revealed by the previous research, some topics generated by LDA may be uninterpretable and semantically incoherent due to the occurrence of irrelevant words in these topics. To improve the semantic qualities of automatically discovered topics, we explore the distributional characteristics of words across topics to identify topic-indiscriminate words which are blamed for the low-quality topics. The main contribution of our research reported in this paper is that we develop a novel framework named Iterative Term Weighting Framework (ITWF)which can effectively identify and filter out topic-indiscriminate words from uncovered topics. In particular, the proposed framework first applies an entropy-based term weighting schemes and adopts a novel iterative method to identify topic-indiscriminate words. To the best of our knowledge, our research is among the very few successful work that aims to enhance both the semantic coherence and the interpretability of LDA-based topic modeling methods. The experimental results show that the proposed framework improves the effectiveness of LDA as well as its variants.

Original languageEnglish
Pages (from-to)248-260
Number of pages13
JournalNeurocomputing
Volume350
DOIs
Publication statusPublished - 20 Jul 2019

Keywords

  • Knowledge acquisition
  • Latent Dirichlet Allocation
  • Term weighting scheme
  • Topic model

ASJC Scopus subject areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Cite this