Abstract
Automatic term extraction (ATE) aims at extracting domain-specific terms from a corpus of a certain domain. Termhood is one essential measure for judging whether a phrase is a term. Previous researches on termhood mainly depend on the word frequency information. In this paper, we propose to compute termhood based on semantic representation of words. A novel topic model, namely i-SWB, is developed to map the domain corpus into a latent semantic space, which is composed of some general topics, a background topic and a documents-specific topic. Experiments on four domains demonstrate that our approach outperforms the state-of-the-art ATE approaches.
Original language | English |
---|---|
Title of host publication | SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval |
Pages | 885-888 |
Number of pages | 4 |
DOIs | |
Publication status | Published - 2 Sept 2013 |
Event | 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 - Dublin, Ireland Duration: 28 Jul 2013 → 1 Aug 2013 |
Conference
Conference | 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 |
---|---|
Country/Territory | Ireland |
City | Dublin |
Period | 28/07/13 → 1/08/13 |
Keywords
- Term extraction
- Termhood
- Topic model
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design
- Information Systems