Abstract
This study investigates the impact of classifiers on language comprehension using
eye-tracking data and the transformer language model. Recent research suggests that classifiers can facilitate the understanding of subsequent nouns. However, quantitative studies exploring the role of classifiers in language comprehension are scarce. By analyzing eye-tracking data from 1.33 million gaze points, we examine the fixation time differences for nouns with and without classifiers. Our findings reveal that words with classifiers have
significantly shorter average duration (P value < 0.05) compared to words without classifiers, with an average reduction in fixation time of 20.632%. Additionally, we utilize the transformer language model BERT to predict masked words based on language distributions and sentence context. Through word prediction experiments on a data set of 100,000 segmented and classifier-tagged sentences, we demonstrate that retaining classifiers significantly
facilitate the prediction of the transformer language model. Notably, classifiers not only improve accuracy rates for subsequent nouns (2.56 times higher), but also for preceding verbs (1.25 times higher), which is a novel finding not reported in previous research. Moreover, measure words exhibit an unexpected and noteworthy capacity to contribute to prediction, while event classifiers and approximation classifiers offer greater advantages in predicting verb semantics compared to general individual classifiers. This observation suggests that
the Chinese classifier system operates as a lexical-semantic system motivated by ontology.
eye-tracking data and the transformer language model. Recent research suggests that classifiers can facilitate the understanding of subsequent nouns. However, quantitative studies exploring the role of classifiers in language comprehension are scarce. By analyzing eye-tracking data from 1.33 million gaze points, we examine the fixation time differences for nouns with and without classifiers. Our findings reveal that words with classifiers have
significantly shorter average duration (P value < 0.05) compared to words without classifiers, with an average reduction in fixation time of 20.632%. Additionally, we utilize the transformer language model BERT to predict masked words based on language distributions and sentence context. Through word prediction experiments on a data set of 100,000 segmented and classifier-tagged sentences, we demonstrate that retaining classifiers significantly
facilitate the prediction of the transformer language model. Notably, classifiers not only improve accuracy rates for subsequent nouns (2.56 times higher), but also for preceding verbs (1.25 times higher), which is a novel finding not reported in previous research. Moreover, measure words exhibit an unexpected and noteworthy capacity to contribute to prediction, while event classifiers and approximation classifiers offer greater advantages in predicting verb semantics compared to general individual classifiers. This observation suggests that
the Chinese classifier system operates as a lexical-semantic system motivated by ontology.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation |
| Editors | Chu-Ren Huang, Yasunari Harada, Jong-Bok Kim, Si Chen, Yu-Yin Hsu, Emmanuele Chersoni, Pranav A, Winnie Huiheng Zeng, Bo Peng, Yuxi Li, Junlin Li |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 912-921 |
| Publication status | Published - Dec 2023 |
| Event | Pacific Asia Conference on Language, Information and Computation (PACLIC 37) - Duration: 2 Dec 2023 → 5 Dec 2023 https://paclic2023.github.io/ |
Conference
| Conference | Pacific Asia Conference on Language, Information and Computation (PACLIC 37) |
|---|---|
| Period | 2/12/23 → 5/12/23 |
| Internet address |