Mari Ostendorf
Computer Science, Boston University

Title:  "Modeling Intra-Utterance Phone Correlation Using A Hidden Dependence Tree"

**************************************************************************
In speech recognition, independence assumptions are typically made to 
reduce the complexity of the training and recognition search problems.  
One of the more blatantly invalid assumptions is that acoustic 
observations of phonemes are generated independently; i.e., there is no 
notion that an "aa" and an "ae" in the same utterance have something in 
common because they came from the same vocal tract.  Vocal tract 
normalization and unsupervised adaptation compensate for this problem to 
some extent, but existing algorithms do not take full advantage of the 
predictive power that observations from one phone have for another 
phone.  In this talk, we will present a new model that provides a 
practical formalism for representing intra-utterance correlation of 
phones (or other sub-word units) using Markov assumptions on a discrete, 
hidden dependence tree.  The dependence tree models the phone "state" of 
an utterance, which is a vector of indices mapping to one of several 
possible mixture modes of a phone model.  The dependence tree state is 
hidden in the same sense that an HMM mixture mode is hidden; observations 
are continuous-valued cepstral features described by Gaussian 
distributions conditioned on the hidden state.  The talk will describe 
algorithms for constructing dependence tree topologies and Gaussian 
mixture parameter estimation, with experimental results on the 
Switchboard corpus using the dependence tree as a separate knowledge 
source for N-best rescoring.  Extensions of the dependence tree model and 
implications for adaptation will be discussed.
**************************************************************************

Seminar Schedule