Rapid Speech Recognizer Adaptation for New Speakers

Humans have little difficulty recognizing speech in noisy environments, speech distorted by having passed through an unknown channel or speech from nonnative speakers. We adapt to the characteristics of the new speech, often after hearing only a few seconds of it. Adaptation techniques have been developed for automatic speech recognizers which attempt to similarly compensate for differences between the speech on which the system was trained, and the speech which it has to recognize. However, several minutes of speech from the new speaker or environment have to be provided to the system to obtain any significant improvement in recognition performance. An automatic speech recognition system employs a number of models for small segments of speech sounds such as phonemes. Simply put, transforming each of these models requires that a sufficient number of samples of each segment be seen from the new speaker. When a small amount of new speech is heard, humans are able to exploit relationships between various sounds so that having heard a few of them in the distorted environment is adequate to adjust for the unheard ones as well. In automatic systems therefore, if sufficient speech is not available to adapt all the models individually, some method must be devised to transform the models of the unheard or insufficiently heard segments based on the heard ones. The participants in this project plan to alleviate the commonly used remedy of tying, or forcing to be identical, the transformation of the models of related speech units. They instead plan to study the dependencies between the speech units, so that the model transformation for one unit influences but is not necessarily identical to the transformation for another unit. They plan to use this knowledge to transform each model individually without requiring a large sample of each speech segment for adaptation. Modelling techniques they plan to employ include covariance models such as Markov random fields and dependency trees.

 

Team Members 
Senior Members
Sanjeev KhudanpurCLSP
Sid BerkowitzDoD
Enrico BocchieriAT&T
William BryneCLSP/JHU
Vassilis DigalakisTUC
Ashvin KannanNuance
Ananth SankarSRI
Graduate Students
John McDonoughCLSP
Costas BoulisTUC
Undergraduate Students
Heather CollierWVU
Adrian CorduneanuToronto

Center for Language and Speech Processing