Are statistics-based approaches good enough for NLP? A case study of maximal-length NP extraction in Mandarin Chinese

Wenjie Li, Haihua Pan, Ming Zhou, Kam Fai Wong, Vincent Lum

Research output: Unpublished conference presentation (presented paper, abstract, poster)Conference presentation (not published in journal/proceeding/book)Academic researchpeer-review

Abstract

Statistics-based approaches became very popular in recent NLP researches, because of their apparent advantages over linguistics or rule-based approaches. Some even claimed that it would not be necessary to employ the latter approach at all. Thus, it seemed necessary to evaluate such claim and the applicability of the former to NLP in general. Because of the usefulness of noun phrases (NPs) in many applications, in this paper, we present a simple statistics-based partial parser to detect the boundaries of maximal-length NPs in part-of-speech tagged Chinese texts. On the basis of our experimental results, we will show that statistics-based approaches with purely part-of-speech tags are not adequate for NP extraction in Chinese; they fail to handle cases with structural ambiguity. Our experiments suggest that syntactic and semantic checking is necessary to correctly mark the boundary of maximal-length NPs in Chinese. We conclude with possible solutions to the problematic cases for statistics-based approaches.

Original languageEnglish
Pages137-153
Number of pages17
Publication statusPublished - 1995
Event8th Computational Linguistics Conference, ROCLING 1995 - Taoyuan, Taiwan
Duration: 17 Aug 199519 Aug 1995

Conference

Conference8th Computational Linguistics Conference, ROCLING 1995
Country/TerritoryTaiwan
CityTaoyuan
Period17/08/9519/08/95

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Are statistics-based approaches good enough for NLP? A case study of maximal-length NP extraction in Mandarin Chinese'. Together they form a unique fingerprint.

Cite this