Abstract
The introduction of interview speech in recent NIST Speaker Recognition Evaluations (SREs) has necessitated the development of robust voice activity detectors (VADs) that can work under very low signal-to-noise ratio. This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties of detecting speech/non-speech segments in these files. To alleviate these difficulties, this paper proposes a VAD that uses noise reduction as a pre-processing step. A strategy to avoid the undesirable effects of impulsive signals and sinusoidal background-signals on the VAD is also proposed. The proposed VAD is compared with the VAD in the ETSI-AMR speech coder for removing silence regions of interview speech files. The results show that the proposed VAD is more robust in detecting speech segments under very low SNR, leading to a significant performance gain in Common Conditions 1-4 of NIST 2008 SRE.
Original language | English |
---|---|
Title of host publication | APSIPA ASC 2010 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference |
Pages | 64-71 |
Number of pages | 8 |
Publication status | Published - 1 Dec 2010 |
Event | 2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010 - Biopolis, Singapore Duration: 14 Dec 2010 → 17 Dec 2010 |
Conference
Conference | 2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010 |
---|---|
Country/Territory | Singapore |
City | Biopolis |
Period | 14/12/10 → 17/12/10 |
Keywords
- Far-field microphone
- NIST speaker recognition evaluations
- Noise reduction
- Speaker verification
- Spectral subtraction
- Voice activity detection
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems