Dynamic Segmental Models of Speech Coarticulation

Automatic speech recognition has achieved significant success by using powerful and complex models for representing and interpreting the speech (acoustic) signal. However these models require unreasonable amounts of training data. Some researchers think that the nature and fundamental philosophy of the current acoustic-phonetic modelling methods, such as hidden Markov models, are inappropriate. Participants in this project plan to explore a different way of thinking of the nature of speech patterns. Their proposed model has a long history in speech science, but it has yet to be successfully applied to automatic speech recognition. The speech signal can be thought of as being generated by a relatively low dimensional system, namely our articulatory organs, moving slowly relative to the variations of the signal picked up by a microphone. The proposed computational model consists of a linear dynamical process describing smooth movement of the vocal tract resonance, which flows from one phonetic unit to another, with the observed features of the acoustic signal being a nonlinear function of this process. Vocal tract resonance is a characteristic of the vocal tract that is related to the familiar notion of formants; it corresponds roughly to the formants for vocalic sounds and though it may not correspond to spectral peaks for consonants, it changes smoothly through them as the configuration of the articulators changes. The participating researchers expect that this model will be robust even for modest amounts of training data due to its compactness. Computational techniques they plan to use in this project include nonlinear regression, multilayer perceptrons and Kalman filtering.


Team Members
Senior Members
John BridleDragon UK
Li DengWaterloo
Joe PiconeMiss. State
Hywel RichardsDragon UK
Mike SchusterNara, Japan
Graduate Students
Terri KammCLSP
Jeff MaWaterloo
Undergraduate Students
Sandi PikeBrown
Roland ReaganCMU

Center for Language and Speech Processing