Abstract
We describe a highly accurate large-vocabulary continuous Mandarin speech recognizer, a collaborative effort among four research organizations. Particularly, we build two acoustic models (AMs) with significant differences but similar accuracy for the purposes of cross adaptation and system combination. This paper elaborates on the main differences between the two systems, where one recognizer incorporates a discriminatively trained feature while the other utilizes a discriminative feature transformation. Additionally we present an improved acoustic segmentation algorithm and topicbased language model (LM) adaptation. Coupled with increased acoustic training data, we reduced the character error rate (CER) of the DARPA GALE 2006 evaluation set to 15.3% from 18.4%.
Original language | English |
---|---|
Title of host publication | 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007, Proceedings |
Pages | 490-495 |
Number of pages | 6 |
Publication status | Published - 1 Dec 2007 |
Externally published | Yes |
Event | 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 - Kyoto, Japan Duration: 9 Dec 2007 → 13 Dec 2007 |
Conference
Conference | 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 |
---|---|
Country/Territory | Japan |
City | Kyoto |
Period | 9/12/07 → 13/12/07 |
Keywords
- Acoustic segmentation
- Character error rates
- Discriminative features
- LM adaptation
- Mandarin
- Multi-layer perceptrons
- Out-of-vocabulary
ASJC Scopus subject areas
- Computer Vision and Pattern Recognition
- Software
- Artificial Intelligence