Detecting, categorizing and clustering entity mentions in Chinese text

Wenjie Li, Donglei Qian, Qin Lu, Chunfa Yuan

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

10 Citations (Scopus)

Abstract

The work presented in this paper is motivated by the practical need for content extraction, and the available data source and evaluation benchmark from the ACE program. The Chinese Entity Detection and Recognition (EDR) task is of particular interest to us. This task presents us several language-independent and language-dependent challenges, e.g. rising from the complication of extraction targets and the problem of word segmentation, etc. In this paper, we propose a novel solution to alleviate the problems special in the task. Mention detection takes advantages of machine learning approaches and character-based models. It manipulates different types of entities being mentioned and different constitution units (i.e. extents and heads) separately. Mentions referring to the same entity are linked together by integrating most-specific-first and closest-first rule based pairwise clustering algorithms. Types of mentions and entities are determined by head-driven classification approaches. The implemented system achieves ACE value of 66.1 when evaluated on the EDR 2005 Chinese corpus, which has been one of the top-tier results. Alternative approaches to mention detection and clustering are also discussed and analyzed.
Original languageEnglish
Title of host publicationProceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
Pages647-654
Number of pages8
DOIs
Publication statusPublished - 30 Nov 2007
Event30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 - Amsterdam, Netherlands
Duration: 23 Jul 200727 Jul 2007

Conference

Conference30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07
Country/TerritoryNetherlands
CityAmsterdam
Period23/07/0727/07/07

Keywords

  • Entity mentions in Chinese
  • Mention categorization and mention clustering
  • Mention detection

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Applied Mathematics

Cite this