Frame-based SEMG-to-speech conversion

Yuet Ming Lam, Philip Heng Wai Leong, Man Wai Mak

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

4 Citations (Scopus)

Abstract

This paper presents a methodology that uses surface electromyogram (SEMG) signals recorded from the cheek and chin to synthesize speech. A neural network Is trained to map the SEMG features (short-time Fourier transform coefficients) Into vector-quantized codebook indices of speech features (linear prediction coefficients, pitch, and energy). To synthesize a word, SEMG signals recorded during pronouncing a word are blocked Into frames; SEMG features are then extracted from each SEMG frame and presented to the neural network to obtain a sequence of speech feature Indices. The waveform of the word is then constructed by concatenating the pre-recorded speech segments corresponding to the feature Indices. Experimental evaluations based on the synthesis of eight words show that on average over 70% of the words can be synthesized correctly and the neural network can classify SEMG frames Into seven phonemes and silence at a rate of 77.8%. The rate can be further Improved to 88.3% by assuming medium-time stationarity of the speech signals. The experimental results demonstrate the feasibility of synthesizing words based on SEMG signals only.
Original languageEnglish
Title of host publicationProceedings of the 2006 49th Midwest Symposium on Circuits and Systems, MWSCAS'06
Pages240-244
Number of pages5
Volume1
DOIs
Publication statusPublished - 1 Dec 2006
Event2006 49th Midwest Symposium on Circuits and Systems, MWSCAS'06 - San Juan, Puerto Rico
Duration: 6 Aug 20069 Aug 2007

Conference

Conference2006 49th Midwest Symposium on Circuits and Systems, MWSCAS'06
Country/TerritoryPuerto Rico
CitySan Juan
Period6/08/069/08/07

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Frame-based SEMG-to-speech conversion'. Together they form a unique fingerprint.

Cite this