Abstract
Identification of transliterated names is a particularly difficult task of Named Entity Recognition (NER), especially in the Chinese context. Of all possible variations of transliterated named entities, the difference between PRC and Taiwan is the most prevalent and most challenging. In this paper, we introduce a novel approach to the automatic extraction of diverging transliterations of foreign named entities by bootstrapping co-occurrence statistics from tagged and segmented Chinese corpus. Preliminary experiment yields promising results and shows its potential in NLP applications.
Original language | English |
---|---|
Pages (from-to) | 153-156 |
Number of pages | 4 |
Journal | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
Publication status | Published - Jun 2007 |
Externally published | Yes |
Event | 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 - Prague, Czech Republic Duration: 25 Jun 2007 → 27 Jun 2007 |
ASJC Scopus subject areas
- Computer Science Applications
- Linguistics and Language
- Language and Linguistics