Abstract
The paper proposes an integrated framework for web personal information extraction, such as biographical information and occupation, and those kinds of information are necessary to further construct a social network (a kind of semantic web) for a person. As web data is heterogeneous in nature, most of IE systems, regardless of named entity recognition (NER) or relation detection and recognition (RDR) systems, fail to get reliably robust results. We propose a flexible framework, which can effectively complement state-of-the-art statistical IE systems with rule-based IE systems for web data, and achieves substantial improvement over other existing systems. In particular, in our current experiment, both the rule-based IE system, which is designed according to some web specific expression patterns, and the statistical IE systems, which are developed for some homogeneous corpora, are sensitive only to specific information types. Hence we argue that our system performance can be incrementally improved when new and effective IE systems are added into our framework. M. Lee, and Chu-Ren Huang.
Original language | English |
---|---|
Title of host publication | PACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation |
Pages | 82-91 |
Number of pages | 10 |
Volume | 1 |
Publication status | Published - 1 Dec 2009 |
Event | 23rd Pacific Asia Conference on Language, Information and Computation, PACLIC 23 - Hong Kong, Hong Kong Duration: 3 Dec 2009 → 5 Dec 2009 |
Conference
Conference | 23rd Pacific Asia Conference on Language, Information and Computation, PACLIC 23 |
---|---|
Country/Territory | Hong Kong |
City | Hong Kong |
Period | 3/12/09 → 5/12/09 |
Keywords
- Information extraction
- Relation extraction
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science (miscellaneous)