Abstract
In the design of controlled experiments with language stimuli, researchers from psycholinguistic, neurolinguistic, and related fields, require language resources that isolate variables known to affect language processing. This article describes a freely available database that provides word level statistics for words and nonwords of Mandarin, Chinese. The featured lexical statistics include subtitle corpus frequency, phonological neighborhood density, neighborhood frequency, and homophone density. The accompanying word descriptors include pinyin, ascii phonetic transcription (sampa), lexical tone, syllable structure, dominant PoS, and syllable, segment and pinyin lengths for each phonological word. It is designed for researchers particularly concerned with language processing of isolated words and made to accommodate multiple existing hypotheses concerning the structure of the Mandarin syllable. The database is divided into multiple files according to the desired search criteria: 1) the syllable segmentation schema used to calculate density measures, and 2) whether the search is for words or nonwords. The database is open to the research community at https://github.com/karlneergaard/Mandarin-Neighborhood-Statistics.
Original language | English |
---|---|
Title of host publication | Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 |
Publisher | European Language Resources Association (ELRA) |
Pages | 4032-4036 |
Number of pages | 5 |
ISBN (Electronic) | 9782951740891 |
Publication status | Published - 1 Jan 2016 |
Event | 10th International Conference on Language Resources and Evaluation, LREC 2016 - Grand Hotel Bernardin Conference Center, Portoroz, Slovenia Duration: 23 May 2016 → 28 May 2016 |
Conference
Conference | 10th International Conference on Language Resources and Evaluation, LREC 2016 |
---|---|
Country/Territory | Slovenia |
City | Portoroz |
Period | 23/05/16 → 28/05/16 |
Keywords
- Chinese
- Lexical statistics
- Mandarin
- Phonological neighborhood density
ASJC Scopus subject areas
- Linguistics and Language
- Library and Information Sciences
- Language and Linguistics
- Education