CLSP Homepage : Workshop Homepage
Workshop 2006
Workshop 2006 Tuesday, May 13, 2008

Articulatory Feature-based Speech Recognition

Prevailing approaches to automatic speech recognition (hidden Markov models, finite-state transducers) are typically based on the assumption that a word can be represented as a single sequence of phonetic states. However, the production of a word involves the simultaneous motion of several articulators, such as the lips and tongue, which may move asynchronously and may not always reach their target positions. This may be more naturally and parsimoniously modeled using multiple streams of hidden states, each corresponding to an articulatory feature (AF). Recent theories of phonology support this idea, representing words using multiple streams of sub-phonetic features, which may be either directly related to the articulators or more abstract (e.g. manner and place). In addition, factoring the observation model of a recognizer into multiple factors, each corresponding to a different AF, may allow for savings in training data. Finally, such an approach can be naturally applied to audio-visual speech recognition, in which the asynchrony between articulators is particularly striking; and multilingual speech recognition, which may leverage the universality of some AFs across languages.

This project will explore the large space of possible AF-based models for automatic speech recognition, on both audio-only and audio-visual tasks. While a good deal of previous work has investigated various components of such a recognizer, such as AF classifiers and AF-based pronunciation models, little effort has gone into building complete, fully AF-based recognizers. Our models will be represented as dynamic Bayesian networks. This is a natural framework for modeling processes with inherent factorization of the state space, and allows for investigation of a large variety of models using universal training and decoding algorithms.


Find Details about the plans and progress of this project here and here.

 
Team Members:
Karen Livescu Team Leader MIT klivescu at csail dot mit dot edu
Mark Hasegawa-Johnson Senior Researcher UIUC jhasegaw at uiuc dot edu
Simon King Senior Researcher University of Edinburgh Simon dot King at ed dot ac dot uk
Ozgur Cetin Senior Researcher ICSI ocetin at ICSI dot Berkeley dot edu
Nash Borges Senior Researcher DoD nashborges at jhu dot edu
Chris Bartels Graduate Student University of Washington bartels at ee dot washington dot edu
Partha Lal Graduate Student University of Edinburgh partha dot lal at gmail dot com
Art Kantor Graduate Student UIUC akantor at uiuc dot edu
Lisa Yung Graduate Student Johns Hopkins University lyung1 at jhu dot edu
Bronwyn Woods Undergraduate Student Swarthmore bwoods1 at swarthmore dot edu
Stephen Dawson-Haggerty Undergraduate Student Harvard sdawson at fas dot harvard dot edu
Ari Bezman Undergraduate Student Dartmouth ari dot bezman at dartmouth dot edu
 

The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
*Telephone: (410) 516-4237 *Fax: (410) 516-5050 *E-mail: clsp@clsp.jhu.edu