Building a Highly accurate mandarin speech recognizer with language-independent technologies and language-dependent modules

Mei Yuh Hwang, Gang Peng, Mari Ostendorf, Wen Wang, Arlo Faria, Aaron Heidel

Research output: Journal article publicationJournal articleAcademic researchpeer-review

10 Citations (Scopus)


We describe a system for highly accurate large-vocabulary Mandarin speech recognition. The prevailing hidden Markov model based technologies are essentially language independent and constitute the backbone of our system. These include minimum-phone-error discriminative training and maximum-likelihood linear regression adaptation, among others. Additionally, careful considerations are taken into account for Mandarin-specific issues including lexical word segmentation, tone modeling, phone set design, and automatic acoustic segmentation. Our system comprises two sets of acoustic models for the purposes of cross adaptation. The systems are designed to be complementary in terms of errors but with similar overall accuracy by using different phone sets and different combinations of discriminative learning. The outputs of the two subsystems are then rescored by an adapted n-gram language model. Final confusion network combination yielded 9.1% character error rate on the DARPA GALE 2007 official evaluation, the best Mandarin recognition system in that year.
Original languageEnglish
Article number5165110
Pages (from-to)1253-1262
Number of pages10
JournalIEEE Transactions on Audio, Speech and Language Processing
Issue number7
Publication statusPublished - 1 Sep 2009
Externally publishedYes


  • Confusion network combination
  • Cross adaptation
  • Discriminative training
  • GALE
  • Hidden activation temporal patterns (HATs)
  • Mandarin automatic speech recognition (ASR)
  • Mandarin pronunciations
  • Multilayer perceptron (MLP)
  • Tandem MLP

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Cite this