Abstract
We propose to differentiate different registers based on the distribution of different Parts of Speeches. Based on a type-theoretical approach, grammatical categories are defined by their combinatory and mapping functions. With noun as the basic category representing entities, verbs are functions taking them as arguments; and adverbs are functions taking verbs as arguments. Based on this different functional mapping relations, we hypothesis that their ratio, like unit-constituency ratios, can differentiate different types of texts, and especially registers. We calculated the ratios between grammatical categories based on their function mapping relations. For example the ratio between verbs and nouns, and adverbs and verbs. The boxplots was used to show the distribution of the ratios between these parts of speeches in each register. The linear regression was used to verify the differences of these ratios in different registers. The text clustering result showed that these ratios can differ conversational and written registers.
Original language | English |
---|---|
Title of host publication | Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation |
Editors | Ryo Otoguro, Mamoru Komachi, Tomoko Ohkuma |
Pages | 57-67 |
Number of pages | 11 |
Publication status | Published - 2019 |
Event | 33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019 - Hakodate, Japan Duration: 13 Sept 2019 → 15 Sept 2019 |
Conference
Conference | 33rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2019 |
---|---|
Country/Territory | Japan |
City | Hakodate |
Period | 13/09/19 → 15/09/19 |
Keywords
- Chinese register
- Linear regression
- Parts of speeches
- Text clustering
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science (miscellaneous)