A novel topic model for automatic term extraction

Sujian Li, Jiwei Li, Tao Song, Wenjie Li, Baobao Chang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

11 Citations (Scopus)

Abstract

Automatic term extraction (ATE) aims at extracting domain-specific terms from a corpus of a certain domain. Termhood is one essential measure for judging whether a phrase is a term. Previous researches on termhood mainly depend on the word frequency information. In this paper, we propose to compute termhood based on semantic representation of words. A novel topic model, namely i-SWB, is developed to map the domain corpus into a latent semantic space, which is composed of some general topics, a background topic and a documents-specific topic. Experiments on four domains demonstrate that our approach outperforms the state-of-the-art ATE approaches.
Original languageEnglish
Title of host publicationSIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages885-888
Number of pages4
DOIs
Publication statusPublished - 2 Sep 2013
Event36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 - Dublin, Ireland
Duration: 28 Jul 20131 Aug 2013

Conference

Conference36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013
CountryIreland
CityDublin
Period28/07/131/08/13

Keywords

  • Term extraction
  • Termhood
  • Topic model

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this