Abstract
The work presented in this paper is motivated by the practical need for content extraction, and the available data source and evaluation benchmark from the ACE program. The Chinese Entity Detection and Recognition (EDR) task is of particular interest to us. This task presents us several language-independent and language-dependent challenges, e.g. rising from the complication of extraction targets and the problem of word segmentation, etc. In this paper, we propose a novel solution to alleviate the problems special in the task. Mention detection takes advantages of machine learning approaches and character-based models. It manipulates different types of entities being mentioned and different constitution units (i.e. extents and heads) separately. Mentions referring to the same entity are linked together by integrating most-specific-first and closest-first rule based pairwise clustering algorithms. Types of mentions and entities are determined by head-driven classification approaches. The implemented system achieves ACE value of 66.1 when evaluated on the EDR 2005 Chinese corpus, which has been one of the top-tier results. Alternative approaches to mention detection and clustering are also discussed and analyzed.
Original language | English |
---|---|
Title of host publication | Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 |
Pages | 647-654 |
Number of pages | 8 |
DOIs | |
Publication status | Published - 30 Nov 2007 |
Event | 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 - Amsterdam, Netherlands Duration: 23 Jul 2007 → 27 Jul 2007 |
Conference
Conference | 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 |
---|---|
Country/Territory | Netherlands |
City | Amsterdam |
Period | 23/07/07 → 27/07/07 |
Keywords
- Entity mentions in Chinese
- Mention categorization and mention clustering
- Mention detection
ASJC Scopus subject areas
- Information Systems
- Software
- Applied Mathematics