Abstract
The existence of structural ambiguity in modifying clauses renders noun phrase (NP) extraction from running Chinese texts complicated. It is shown from previous experiments that nearly 33% of the errors in an NP extractor were actually caused by the use of clause modifiers. For example, consider the sequence "V + NP1+ (of) + NP0." It can be interpreted as two alternatives, a verb phrase (i.e., [V[NP1+ NP0]NP]VP) or a noun phrase (i.e., [[V NP1]VP+ NP0]NP). To resolve this ambiguity, syntactical, contextual, and semantics-based approaches are investigated in this article. The conclusion is that the problem can be overcome only when the semantic knowledge about words is adopted. Therefore, a structural disambiguation algorithm based on lexical association is proposed. The algorithm uses the semantic class relation between a word pair derived from a standard Chinese thesaurus, to work out whether a noun phrase or a verb phrase has a stronger lexical association within the collocation. This can, in turn, determine the intended phrase structure. With the proposed algorithm, the best accuracy and coverage are 79% and 100%, respectively. The experiment also shows that the backed-off model is more effective for this purpose. With this disambiguation algorithm, parsing performance can be significantly improved.
Original language | English |
---|---|
Pages (from-to) | 64-85 |
Number of pages | 22 |
Journal | Computational Intelligence |
Volume | 19 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Jan 2003 |
Externally published | Yes |
Keywords
- Chinese language processing
- Noun phrase extraction
- Structural disambiguation
ASJC Scopus subject areas
- Computational Mathematics
- Artificial Intelligence