TY - GEN
T1 - Annotation and Classification of Light Verbs and Light Verb Variations in Mandarin Chinese
AU - Lin, Jingxia
AU - Xu, Hongzhi
AU - Jiang, Menghan
AU - Huang, Chu Ren
N1 - Publisher Copyright:
© COLING 2014. All rights reserved.
PY - 2014/8
Y1 - 2014/8
N2 - Light verbs pose an a challenge in linguistics because of its syntactic and semantic versatility and its unique distribution different from regular verbs with higher semantic content and selectional resrictions. Due to its light grammatical content, earlier natural language processing studies typically put light verbs in a stop word list and ignore them. Recently, however, classification and identification of light verbs and light verb construction have become a focus of study in computational linguistics, especially in the context of multi-word expression, information retrieval, disambiguation, and parsing. Past linguistic and computational studies on light verbs had very different foci. Linguistic studies tend to focus on the status of light verbs and its various selectional constraints. While NLP studies have focused on light verbs in the context of either a multi-word expression (MWE) or a construction to be identified, classified, or translated, trying to overcome the apparent poverty of semantic content of light verbs. There has been nearly no work attempting to bridge these two lines of research. This paper takes this challenge by proposing a corpus-bases study which classifies and captures syntactic-semantic difference among all light verbs. In this study, we first incorporate results from past linguistic studies to create annotated light verb corpora with syntactic-semantics features. We next adopt a statistic method for automatic identification of light verbs based on this annotated corpora. Our results show that a language resource based methodology optimally incorporating linguistic information can resolve challenges posed by light verbs in NLP.
AB - Light verbs pose an a challenge in linguistics because of its syntactic and semantic versatility and its unique distribution different from regular verbs with higher semantic content and selectional resrictions. Due to its light grammatical content, earlier natural language processing studies typically put light verbs in a stop word list and ignore them. Recently, however, classification and identification of light verbs and light verb construction have become a focus of study in computational linguistics, especially in the context of multi-word expression, information retrieval, disambiguation, and parsing. Past linguistic and computational studies on light verbs had very different foci. Linguistic studies tend to focus on the status of light verbs and its various selectional constraints. While NLP studies have focused on light verbs in the context of either a multi-word expression (MWE) or a construction to be identified, classified, or translated, trying to overcome the apparent poverty of semantic content of light verbs. There has been nearly no work attempting to bridge these two lines of research. This paper takes this challenge by proposing a corpus-bases study which classifies and captures syntactic-semantic difference among all light verbs. In this study, we first incorporate results from past linguistic studies to create annotated light verb corpora with syntactic-semantics features. We next adopt a statistic method for automatic identification of light verbs based on this annotated corpora. Our results show that a language resource based methodology optimally incorporating linguistic information can resolve challenges posed by light verbs in NLP.
UR - http://www.scopus.com/inward/record.url?scp=84973475327&partnerID=8YFLogxK
U2 - 10.3115/v1/W14-5810
DO - 10.3115/v1/W14-5810
M3 - Conference article published in proceeding or book
AN - SCOPUS:84973475327
T3 - Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing, LG-LP 2014 - in conjunction with 25th International Conference on Computational Linguistics, COLING 2014
SP - 75
EP - 82
BT - Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing, LG-LP 2014 - in conjunction with 25th International Conference on Computational Linguistics, COLING 2014
A2 - Baptista, Jorge
A2 - Bhattacharyya, Pushpak
A2 - Fellbaum, Christiane
A2 - Forcada, Mikel
A2 - Huang, Chu-Ren
A2 - Koeva, Svetla
A2 - Krstev, Cvetana
A2 - Laporte, Eric
PB - Association for Computational Linguistics (ACL)
T2 - 2014 Workshop on Lexical and Grammatical Resources for Language Processing, LG-LP 2014
Y2 - 24 August 2014
ER -