Articulatory Feature-based Speech Recognition

Research Group of the 2006 Summer Workshop

Prevailing approaches to automatic speech recognition (hidden Markov models, finite-state transducers) are typically based on the assumption that a word can be represented as a single sequence of phonetic states. However, the production of a word involves the simultaneous motion of several articulators, such as the lips and tongue, which may move asynchronously and may not always reach their target positions. This may be more naturally and parsimoniously modeled using multiple streams of hidden states, each corresponding to an articulatory feature (AF). Recent theories of phonology support this idea, representing words using multiple streams of sub-phonetic features, which may be either directly related to the articulators or more abstract (e.g. manner and place). In addition, factoring the observation model of a recognizer into multiple factors, each corresponding to a different AF, may allow for savings in training data. Finally, such an approach can be naturally applied to audio-visual speech recognition, in which the asynchrony between articulators is particularly striking; and multilingual speech recognition, which may leverage the universality of some AFs across languages.

This project will explore the large space of possible AF-based models for automatic speech recognition, on both audio-only and audio-visual tasks. While a good deal of previous work has investigated various components of such a recognizer, such as AF classifiers and AF-based pronunciation models, little effort has gone into building complete, fully AF-based recognizers. Our models will be represented as dynamic Bayesian networks. This is a natural framework for modeling processes with inherent factorization of the state space, and allows for investigation of a large variety of models using universal training and decoding algorithms.

Final Report

Team Members
Senior Members
Nash Borges	DoD
Ozgur Cetin	ICSI
Mark Hasegawa-Johnson	UIUC
Simon King	University of Edinburgh
Karen Livescu	MIT
Graduate Students
Chris Bartels	University of Washington
Art Kantor	UIUC
Partha Lal	University of Edinburgh
Lisa Yung	Johns Hopkins University
Undergraduate Students
Ari Bezman	Dartmouth
Stephen Dawson-Haggerty	Harvard
Bronwyn Woods	Swarthmore

Articulatory Feature-based Speech Recognition

Upcoming Seminars

Center for Language and Speech Processing