TY - GEN
T1 - Applying machine learning to chinese entity detection and tracking
AU - Qian, Donglei
AU - Li, Wenjie
AU - Yuan, Chunfa
AU - Lu, Qin
AU - Wu, Mingli
PY - 2007/12/20
Y1 - 2007/12/20
N2 - This paper presents a Chinese entity detection and tracking system that takes advantages of character-based models and machine learning approaches. An entity here is defined as a link of all its mentions in text together with the associated attributes. Entity mentions of different types normally exhibit quite different linguistic patterns. Six separate Conditional Random Fields (CRF) models that incorporate character N-gram and word knowledge features are built to detect the extent and the head of three types of mentions, namely named, nominal and pronominal mentions. For each type of mentions, attributes are identified by Support Vector Machine (SVM) classifiers which take mention heads and their context as classification features. Mentions can then be merged into a unified entity representation by examining their attributes and connections in a rule-based coreference resolution process. The system is evaluated on ACE 2005 corpus and achieves competitive results.
AB - This paper presents a Chinese entity detection and tracking system that takes advantages of character-based models and machine learning approaches. An entity here is defined as a link of all its mentions in text together with the associated attributes. Entity mentions of different types normally exhibit quite different linguistic patterns. Six separate Conditional Random Fields (CRF) models that incorporate character N-gram and word knowledge features are built to detect the extent and the head of three types of mentions, namely named, nominal and pronominal mentions. For each type of mentions, attributes are identified by Support Vector Machine (SVM) classifiers which take mention heads and their context as classification features. Mentions can then be merged into a unified entity representation by examining their attributes and connections in a rule-based coreference resolution process. The system is evaluated on ACE 2005 corpus and achieves competitive results.
UR - http://www.scopus.com/inward/record.url?scp=37149033430&partnerID=8YFLogxK
M3 - Conference article published in proceeding or book
SN - 354070938X
SN - 9783540709381
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 154
EP - 165
BT - Computational Linguistics and Intelligent Text Processing - 8th International Conference, CICLing 2007, Proceedings
T2 - 8th Annual Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2007
Y2 - 18 February 2007 through 24 February 2007
ER -