Audio-Visual Speech Communication

Louis D. Braida, Massachusetts Institute of Technology

March 24, 1998


Abstract

Although the intelligibility of the acoustic speech signal is usually very high, in many situations speech reception is improved if cues derived from the visible actions of the talker's face are integrated with cues derived from the acoustic signal. Such integration aids listeners with normal hearing under difficult communication conditions and listeners with hearing impairments, under nearly all listening conditions.

This talk will describe models of audiovisual integration that have been successful in predicting how well listeners combine visual speech cues with auditory cues. It will also describe how such models can be adapted to predicting the magnitude of the McGurk Effect, illusory perceptions elicited when the auditory and visual components of speech are mismatched. Finally, the talk will discuss recent research aimed at the development of supplements to speechreading based on the use of automatic speech recognition.