Abstract
This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. A neural network Is trained to map the SEMG features (short-time Fourier transform coefficients) Into vector-quantized codebook indices of speech features (linear prediction coefficients, pitch, and energy). To synthesize a word, SEMG signals recorded during pronouncing a word are blocked Into frames; SEMG features are then extracted from each SEMG frame and presented to the neural network to obtain a sequence of speech feature Indices. The waveform of the word is then constructed by concatenating the pre-recorded speech segments corresponding to the feature Indices. Experimental evaluations based on the synthesis of eight words show that on average over 70% of the words can be synthesized correctly and the neural network can classify SEMG frames Into seven phonemes and silence at a rate of 77.8%. The rate can be further Improved to 88.3% by assuming medium-time stationarity of the speech signals. The experimental results demonstrate the feasibility of synthesizing words based on SEMG signals only.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2006 49th Midwest Symposium on Circuits and Systems, MWSCAS'06 |
Pages | 240-244 |
Number of pages | 5 |
Volume | 1 |
DOIs | |
Publication status | Published - 1 Dec 2006 |
Event | 2006 49th Midwest Symposium on Circuits and Systems, MWSCAS'06 - San Juan, Puerto Rico Duration: 6 Aug 2006 → 9 Aug 2007 |
Conference
Conference | 2006 49th Midwest Symposium on Circuits and Systems, MWSCAS'06 |
---|---|
Country/Territory | Puerto Rico |
City | San Juan |
Period | 6/08/06 → 9/08/07 |
ASJC Scopus subject areas
- Electronic, Optical and Magnetic Materials
- Electrical and Electronic Engineering