Lip-motion analysis for speech segmentation in noise

Man Wai Mak, W. G. Allen

Research output: Journal article publicationJournal articleAcademic researchpeer-review

11 Citations (Scopus)

Abstract

This paper explains how visual information from the lips and acoustic signals can be combined together for speech segmentation. The psychological aspects of lip-reading and current automatic lip-reading systems are reviewed. The paper describes an image processing system which can extract the velocity of the lips from image sequences. The velocity of the lips is estimated by a combination of morphological image processing and block matching techniques. The resultant velocity of the lips is used to locate the syllable boundaries. This information is particularly useful when the speech signal is corrupted by noise. The paper also demonstrates the correlation between speech signals and lip information. Data fusion techniques are used to combine the acoustic and visual information for speech segmentation. The principal results show that using the combination of visual and acoustic signals can reduce segmentation errors by at least 10.4% when the signal-to-noise ratio is lower than 15 dB.
Original languageEnglish
Pages (from-to)279-296
Number of pages18
JournalSpeech Communication
Volume14
Issue number3
DOIs
Publication statusPublished - 1 Jan 1994
Externally publishedYes

Keywords

  • Articulatory dynamics
  • Block matching
  • Data fusion
  • Lip-reading
  • Lip-tracking
  • Speech segmentation

ASJC Scopus subject areas

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this