TY - GEN
T1 - Corpus-based Study and Identification of Mandarin Chinese Light Verb Variations
AU - Huang, Chu Ren
AU - Lin, Jingxia
AU - Jiang, Menghan
AU - Xu, Hongzhi
N1 - Publisher Copyright:
© 2014 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, VarDial 2014 at the 25th International Conference on Computational Linguistics: System Demonstrations, COLING 2014 - Proceedings. All Rights Reserved.
PY - 2014/8
Y1 - 2014/8
N2 - When PRC was founded on mainland China and the KMT retreated to Taiwan in 1949, the relation between mainland China and Taiwan became a classical Cold War instance. Neither travel, visit, nor correspondences were allowed between the people until 1987, when government on both sides started to allow small number of Taiwan people with relatives in China to return to visit through a third location. Although the thawing eventually lead to frequent exchanges, direct travel links, and close commercial ties between Taiwan and mainland China today, 38 years of total isolation from each other did allow the language use to develop into different varieties, which have become a popular topic for mainly lexical studies (e.g., Xu, 1995; Zeng, 1995; Wang & Li, 1996). Grammatical difference of these two variants, however, was not well studied beyond anecdotal observation, partly because the near identity of their grammatical systems. This paper focuses on light verb variations in Mainland and Taiwan variants and finds that the light verbs of these two variants indeed show distributional tendencies. Light verbs are chosen for two reasons: first, they are semantically bleached hence more susceptible to changes and variations. Second, the classification of light verbs is a challenging topic in NLP. We hope our study will contribute to the study of light verbs in Chinese in general. The data adopted for this study was a comparable corpus extracted from Chinese Gigaword Corpus and manually annotated with contextual features that may contribute to light verb variations. A multivariate analysis was conducted to show that for each light verb there is at least one context where the two variants show differences in tendencies (usually the presence/absence of a tendency rather than contrasting tendencies) and can be differentiated. In addition, we carried out a K-Means clustering analysis for the variations and the results are consistent with the multivariate analysis, i.e.The light verbs in Mainland and Taiwan indeed have variations and the variations can be successfully differentiated.
AB - When PRC was founded on mainland China and the KMT retreated to Taiwan in 1949, the relation between mainland China and Taiwan became a classical Cold War instance. Neither travel, visit, nor correspondences were allowed between the people until 1987, when government on both sides started to allow small number of Taiwan people with relatives in China to return to visit through a third location. Although the thawing eventually lead to frequent exchanges, direct travel links, and close commercial ties between Taiwan and mainland China today, 38 years of total isolation from each other did allow the language use to develop into different varieties, which have become a popular topic for mainly lexical studies (e.g., Xu, 1995; Zeng, 1995; Wang & Li, 1996). Grammatical difference of these two variants, however, was not well studied beyond anecdotal observation, partly because the near identity of their grammatical systems. This paper focuses on light verb variations in Mainland and Taiwan variants and finds that the light verbs of these two variants indeed show distributional tendencies. Light verbs are chosen for two reasons: first, they are semantically bleached hence more susceptible to changes and variations. Second, the classification of light verbs is a challenging topic in NLP. We hope our study will contribute to the study of light verbs in Chinese in general. The data adopted for this study was a comparable corpus extracted from Chinese Gigaword Corpus and manually annotated with contextual features that may contribute to light verb variations. A multivariate analysis was conducted to show that for each light verb there is at least one context where the two variants show differences in tendencies (usually the presence/absence of a tendency rather than contrasting tendencies) and can be differentiated. In addition, we carried out a K-Means clustering analysis for the variations and the results are consistent with the multivariate analysis, i.e.The light verbs in Mainland and Taiwan indeed have variations and the variations can be successfully differentiated.
UR - http://www.scopus.com/inward/record.url?scp=85015947096&partnerID=8YFLogxK
U2 - 10.3115/v1/W14-5301
DO - 10.3115/v1/W14-5301
M3 - Conference article published in proceeding or book
AN - SCOPUS:85015947096
T3 - 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, VarDial 2014 at the 25th International Conference on Computational Linguistics: System Demonstrations, COLING 2014 - Proceedings
SP - 1
EP - 10
BT - 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, VarDial 2014 at the 25th International Conference on Computational Linguistics
A2 - Zampieri, Marcos
A2 - Tan, Liling
A2 - Ljubesic, Nikola
A2 - Tiedemann, Jorg
PB - Association for Computational Linguistics (ACL)
T2 - 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, VarDial 2014 at the 25th International Conference on Computational Linguistics: System Demonstrations, COLING 2014
Y2 - 23 August 2014
ER -