Database of word-level statistics for Mandarin Chinese (DoWLS-MAN)

Karl David Neergaard, Hongzhi Xu, James S. German, Chu-ren Huang

Research output: Journal article publicationJournal articleAcademic researchpeer-review


In this article we present the Database of Word-Level Statistics for Mandarin Chinese (DoWLS-MAN). The database addresses the lack of agreement in phonological syllable segmentation specific to Mandarin by offering phonological features for each lexical item according to 16 schematic representations of the syllable (8 with tone and 8 without tone). Those lexical statistics that differ per phonological word and nonword due to changes in syllable segmentation are of the variant category and include subtitle lexical frequency, phonological neighborhood density measures, homophone density, and network science measures. The invariant characteristics consist of each items' lexical tone, phonological transcription, and syllable structure among others. The goal of DoWLS-MAN is to provide researchers both the ability to choose stimuli that are derived from a segmentation schema that supports an existing model of Mandarin speech processing, and the ability to choose stimuli that allow for the testing of hypotheses on phonological segmentation according to multiple schemas. In an exploratory analysis we illustrate how multiple schematic representations of the phonological mental lexicon can aid in hypothesis generation, specifically in terms of phonological processing when reading Chinese orthography. Users of the database can search among over 92,000 words, over 1600 out-of-vocabulary Chinese characters, and 4300 phonological nonwords according to either Chinese orthography, pinyin, or ASCII phonetic script. Users can also generate a list of phonological words and nonwords according to user-defined ranges and categories of lexical characteristics. DoWLS-MAN is available to the public for search or download at
Original languageEnglish
Pages (from-to)987–1009
JournalBehavior Research Methods
Early online date17 Aug 2021
Publication statusPublished - Apr 2022


  • Lexical database
  • Phonological neighborhood density
  • Mandarin Chinese
  • Syllable segmentation
  • Network phonology


Dive into the research topics of 'Database of word-level statistics for Mandarin Chinese (DoWLS-MAN)'. Together they form a unique fingerprint.

Cite this