| CLSP Homepage : Workshop Homepage | |
![]() | |
| Workshop 2006 | Monday, November 23, 2009 |
| Prevailing approaches to automatic speech recognition (hidden Markov models, finite-state transducers) are typically based on the assumption that a word can be represented as a single sequence of phonetic states. However, the production of a word involves the simultaneous motion of several articulators, such as the lips and tongue, which may move asynchronously and may not always reach their target positions. This may be more naturally and parsimoniously modeled using multiple streams of hidden states, each corresponding to an articulatory feature (AF). Recent theories of phonology support this idea, representing words using multiple streams of sub-phonetic features, which may be either directly related to the articulators or more abstract (e.g. manner and place).
In addition, factoring the observation model of a recognizer into multiple factors, each corresponding to a different AF, may allow for savings in training data. Finally, such an approach can be naturally applied to audio-visual speech recognition, in which the asynchrony between articulators is particularly striking; and multilingual speech recognition, which may leverage the universality of some AFs across languages.
This project will explore the large space of possible AF-based models for automatic speech recognition, on both audio-only and audio-visual tasks. While a good deal of previous work has investigated various components of such a recognizer, such as AF classifiers and AF-based pronunciation models, little effort has gone into building complete, fully AF-based recognizers. Our models will be represented as dynamic Bayesian networks. This is a natural framework for modeling processes with inherent factorization of the state space, and allows for investigation of a large variety of models using universal training and decoding algorithms. Find Details about the plans and progress of this project here and here. |
|||
| Team Members: | |||
| Karen Livescu | Team Leader | MIT | klivescu at csail dot mit dot edu |
| Mark Hasegawa-Johnson | Senior Researcher | UIUC | jhasegaw at uiuc dot edu |
| Simon King | Senior Researcher | University of Edinburgh | Simon dot King at ed dot ac dot uk |
| Ozgur Cetin | Senior Researcher | ICSI | ocetin at ICSI dot Berkeley dot edu |
| Nash Borges | Senior Researcher | DoD | nashborges at jhu dot edu |
| Chris Bartels | Graduate Student | University of Washington | bartels at ee dot washington dot edu |
| Partha Lal | Graduate Student | University of Edinburgh | partha dot lal at gmail dot com |
| Art Kantor | Graduate Student | UIUC | akantor at uiuc dot edu |
| Lisa Yung | Graduate Student | Johns Hopkins University | lyung1 at jhu dot edu |
| Bronwyn Woods | Undergraduate Student | Swarthmore | bwoods1 at swarthmore dot edu |
| Stephen Dawson-Haggerty | Undergraduate Student | Harvard | sdawson at fas dot harvard dot edu |
| Ari Bezman | Undergraduate Student | Dartmouth | ari dot bezman at dartmouth dot edu |
| The Center for Language and Speech Processing The Johns Hopkins University 3400 North Charles Street, Barton Hall Baltimore, MD 21218 | |||||
| Telephone: (410) 516-4237 | Fax: (410) 516-5050 | E-mail: clsp@clsp.jhu.edu | |||