Abstract
Topic models like Latent Dirichlet Allocation (LDA)and its variants is a type of statistical model for discovering latent topics. However, as revealed by the previous research, some topics generated by LDA may be uninterpretable and semantically incoherent due to the occurrence of irrelevant words in these topics. To improve the semantic qualities of automatically discovered topics, we explore the distributional characteristics of words across topics to identify topic-indiscriminate words which are blamed for the low-quality topics. The main contribution of our research reported in this paper is that we develop a novel framework named Iterative Term Weighting Framework (ITWF)which can effectively identify and filter out topic-indiscriminate words from uncovered topics. In particular, the proposed framework first applies an entropy-based term weighting schemes and adopts a novel iterative method to identify topic-indiscriminate words. To the best of our knowledge, our research is among the very few successful work that aims to enhance both the semantic coherence and the interpretability of LDA-based topic modeling methods. The experimental results show that the proposed framework improves the effectiveness of LDA as well as its variants.
Original language | English |
---|---|
Pages (from-to) | 248-260 |
Number of pages | 13 |
Journal | Neurocomputing |
Volume | 350 |
DOIs | |
Publication status | Published - 20 Jul 2019 |
Keywords
- Knowledge acquisition
- Latent Dirichlet Allocation
- Term weighting scheme
- Topic model
ASJC Scopus subject areas
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence