Abstract
This paper examines the impact of recognition errors on spoken Cantonese query retrieval effectiveness. One of the largest test collection provided by NTCIR for evaluating Chinese information retrieval is used. The retrieval system uses one of the best models (2-Poisson) and the robust bigram indexing strategy. If there are no syllable recognition errors, then the errors in converting spelling (called pinyin) to characters will degrade the performance by 3.9% points which is not statistically significant. Otherwise, the performance dropped by 10.2% points which is statistically significant. We improved our system by merging the /n/ and /I/ phone labels and retrained the syllable-to-text conversion routines. The improved retrieval system dropped only 6.4% points.
Original language | English |
---|---|
Title of host publication | 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, ISIMP 2004 |
Pages | 210-213 |
Number of pages | 4 |
Publication status | Published - 1 Dec 2004 |
Event | 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, ISIMP 2004 - Hong Kong, China, Hong Kong Duration: 20 Oct 2004 → 22 Oct 2004 |
Conference
Conference | 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, ISIMP 2004 |
---|---|
Country/Territory | Hong Kong |
City | Hong Kong, China |
Period | 20/10/04 → 22/10/04 |
ASJC Scopus subject areas
- General Engineering