Ken Grant (Walter Reed National Military Medical Center) “Speech Perception with Minimal Acoustics Cues Informs Novel Approaches to Automatic Speech Recognition”
3400 N Charles St, Baltimore, MD 21218
When confronted with the daunting task of transmitting speech information to deaf individuals, one comes quickly to the conclusion that the solution to this problem requires a full-blown theory of speech perception. Because the bandwidth and dynamic range of speech far exceeds the capacity of the deaf ear, radical recoding of important speech information and sensory substitution schemes have been proposed. Within this framework, at least four major questions must be addressed: 1) What are the essential elements of the signal that must be transmitted? 2) What is the information capacity of the receiving sensory system? 3) Does the information capacity of the receiving system match (or exceed) the demands of the signal(s) being transmitted, and if it doesn’t, how should the signal information be recoded to be better matched to the receiving systems capabilities? 4) What methods will be used to evaluate the success (or failure) of the enterprise? The advantage of dissecting the problem into these 4 crucial questions is that one can develop a systematic approach to understanding speech recognition that applies equally to sensory substitution such as tactile speech aids, advanced bionics such as cochlear implants, or hearing aids. For this talk, I will present several examples of bimodal and unimodal speech recognition where high levels of intelligibility are achieved with minimal auditory information or by incorporating visual speech information gleaned from lipreading (i.e., spreechreading). In the bimodal examples, the amount of transmitted auditory speech information is insufficient to support word or sentence intelligibility (zero percent correct), and the average speechreading performance, even for the very best speechreader (who is usually a deaf individual) might be 10-30% word or sentence intelligibility. Similar findings have been shown for auditory-only speech inputs for signals composed of disjoint and non-overlapping spectral bands where over 90% of the spectral information has been discarded. The very fact that high levels of speech intelligibility (>80%) can be achieved with multimodal inputs where auditory and visual modalities individually fail to transmit enough information to support speech perception and for unimodal inputs composed of combinations of spectral bands where individual bands provide minimal acoustic information may suggest novel approaches to automatic speech recognition.
Ken W. Grant is the Deputy Director of the Audiology and Speech Center (ASC), Chief of the Scientific and Clinical Studies Section (SCSS), Audiology and Speech Center, and the Director of the Auditory-Visual Speech Perception Laboratory (AVSPL) at Walter Reed National Military Medical Center. As the Deputy Director of the ASC, Dr Grant has direct supervisory and mission planning responsibilities for the largest Audiology and Speech-Language-Pathology clinic in the DoD. As chief of the SCSS, he is responsible for the direct supervision of 14 full- and part-time PhD and AuD staff members and helps shape and direct the SCSS research mission with over 2.5 million annually in extramural research dollars. His own research has been concerned primarily with the integration of eye and ear for speech perception in both normal and hearing-impaired populations using behavioral and neurophysiological measures. One of the unique features of the studies conducted in the AVSPL (http://www.avspeechlab.com/) is the focus on individual differences in speech recognition capabilities. Each individual subject is characterized along a number of different dimensions, from basic auditory and visual acuity for the simplest speech elements, to the cognitive processes engaged in interpreting complex sentence structures. In addition to his research on auditory-visual speech processing, Dr. Grant and colleagues at Walter Reed, and the Electrical Engineering and Neuroscience and Cognitive Science Departments at the University of Maryland, College Park have been applying models of auditory processing to hearing-aid algorithm selection. Applications of biologically inspired models of auditory processing to issues of hearing rehabilitation are being explored by the Walter Reed team of auditory scientists and Engineers in order to address one of the central problems in communication sciences: the limited success of hearing aids to improve speech communication in noise and reverberation. Dr. Grant’s most recent work has focused on clinical measures and real-world validation studies related to hearing fitness-for-duty as well as a multi-site effort to determine the prevalence of Central Auditory Processing and Cognitive Speech Communication deficits in blast-exposed service men and women who have normal to near-normal hearing thresholds. Collaborating with laboratories and researchers around the world, studies are being conducted to identify the hearing skills necessary for specific mission related tasks and to develop rapid screening tools for populations at risk for undetected speech communication difficulties. These involve tests of speech in different background noise, segregation of multiple sound sources, integration of multimodal inputs, and localization.