Abstract
Automatic term extraction (ATE) aims at extracting domain-specific terms from a corpus of a certain domain. Termhood is one essential measure for judging whether a phrase is a term. Previous researches on termhood mainly depend on the word frequency information. In this paper, we propose to compute termhood based on semantic representation of words. A novel topic model, namely i-SWB, is developed to map the domain corpus into a latent semantic space, which is composed of some general topics, a background topic and a documents-specific topic. Experiments on four domains demonstrate that our approach outperforms the state-of-the-art ATE approaches.
| Original language | English |
|---|---|
| Title of host publication | SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval |
| Pages | 885-888 |
| Number of pages | 4 |
| DOIs | |
| Publication status | Published - 2 Sept 2013 |
| Event | 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 - Dublin, Ireland Duration: 28 Jul 2013 → 1 Aug 2013 |
Conference
| Conference | 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 |
|---|---|
| Country/Territory | Ireland |
| City | Dublin |
| Period | 28/07/13 → 1/08/13 |
Keywords
- Term extraction
- Termhood
- Topic model
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design
- Information Systems
Fingerprint
Dive into the research topics of 'A novel topic model for automatic term extraction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver