Incorporating concept information into term weighting schemes for topic models

Huakui Zhang, Yi Cai, Bingshan Zhu, Changmeng Zheng, Kai Yang, Raymond Chi Wing Wong, Qing Li

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

3 Citations (Scopus)

Abstract

Topic models demonstrate outstanding ability in discovering latent topics in text corpora. A coherent topic consists of words or entities related to similar concepts, i.e., abstract ideas of categories of things. To generate more coherent topics, term weighting schemes have been proposed for topic models by assigning weights to terms in text, such as promoting the informative entities. However, in current term weighting schemes, entities are not discriminated by their concepts, which may cause incoherent topics containing entities from unrelated concepts. To solve the problem, in this paper we propose two term weighting schemes for topic models, CEP scheme and DCEP scheme, to improve the topic coherence by incorporating the concept information of the entities. More specifically, the CEP term weighting scheme gives more weights to entities from the concepts that reveals the topics of the document. The DCEP scheme further reduces the co-occurrence of the entities from unrelated concepts and separates them into different duplicates of a document. We develop CEP-LDA and DCEP-LDA term weighting topic models by applying the two proposed term weighting schemes to LDA. Experimental results on two public datasets show that CEP-LDA and DCEP-LDA topic models can produce more coherent topics.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 25th International Conference, DASFAA 2020, Proceedings
EditorsYunmook Nah, Bin Cui, Sang-Won Lee, Jeffrey Xu Yu, Yang-Sae Moon, Steven Euijong Whang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages227-244
Number of pages18
ISBN (Print)9783030594152
DOIs
Publication statusPublished - Apr 2020
Event25th International Conference on Database Systems for Advanced Applications, DASFAA 2020 - Jeju, Korea, Republic of
Duration: 24 Sept 202027 Sept 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12113 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Conference on Database Systems for Advanced Applications, DASFAA 2020
Country/TerritoryKorea, Republic of
CityJeju
Period24/09/2027/09/20

Keywords

  • Latent Dirichlet Allocation
  • Term weighting scheme
  • Topic model

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Incorporating concept information into term weighting schemes for topic models'. Together they form a unique fingerprint.

Cite this