The design of a statistical algorithm for resolving structural ambiguity in "V NP1usde NP0"

Wenjie Li, Kam Fai Wong

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

The existence of structural ambiguity in modifying clauses renders noun phrase (NP) extraction from running Chinese texts complicated. It is shown from previous experiments that nearly 33% of the errors in an NP extractor were actually caused by the use of clause modifiers. For example, consider the sequence "V + NP1+ (of) + NP0." It can be interpreted as two alternatives, a verb phrase (i.e., [V[NP1+ NP0]NP]VP) or a noun phrase (i.e., [[V NP1]VP+ NP0]NP). To resolve this ambiguity, syntactical, contextual, and semantics-based approaches are investigated in this article. The conclusion is that the problem can be overcome only when the semantic knowledge about words is adopted. Therefore, a structural disambiguation algorithm based on lexical association is proposed. The algorithm uses the semantic class relation between a word pair derived from a standard Chinese thesaurus, to work out whether a noun phrase or a verb phrase has a stronger lexical association within the collocation. This can, in turn, determine the intended phrase structure. With the proposed algorithm, the best accuracy and coverage are 79% and 100%, respectively. The experiment also shows that the backed-off model is more effective for this purpose. With this disambiguation algorithm, parsing performance can be significantly improved.
Original languageEnglish
Pages (from-to)64-85
Number of pages22
JournalComputational Intelligence
Volume19
Issue number1
DOIs
Publication statusPublished - 1 Jan 2003
Externally publishedYes

Keywords

  • Chinese language processing
  • Noun phrase extraction
  • Structural disambiguation

ASJC Scopus subject areas

  • Computational Mathematics
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'The design of a statistical algorithm for resolving structural ambiguity in "V NP1usde NP0"'. Together they form a unique fingerprint.

Cite this