Measuring termhood in automatic terminology extraction

Qinlong Zhang, Qin Lu, Zhifang Sui

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)


Automatic terminology extraction can be divided into two task. The first task measures the Unithood which is used to identify a string as a lexical unit. The second task measures the so called Termhood used to identify a lexical unit being a domain specific term. This paper proposes a method to measure Termhood in Chinese ATE. It considers the domain specificity of both the components of a candidate term as well as statistical information and other contextual information across different domains and applied to a support vector machine model for terminology extraction. The experiments are based on the Chinese corpus in the IT domain with cross validation of data from outside of the IT domain. Results show that the precision of the open tests can reach over 80% for the top 2,000 candidates and around 50% for the top 20,000 candidate. Furthermore, experiments with different lexicon size shows that the algorithm does not require a comprehensive domain lexicon of a large size. A few thousand basic domain terms would be sufficient to achieve the above mentioned performance.
Original languageEnglish
Title of host publicationIEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering
Number of pages8
Publication statusPublished - 1 Dec 2007
EventInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007 - Beijing, China
Duration: 30 Aug 20071 Sept 2007


ConferenceInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Information Systems and Management


Dive into the research topics of 'Measuring termhood in automatic terminology extraction'. Together they form a unique fingerprint.

Cite this