Modeling Intra-Utterance Phone Correlation Using A Hidden Dependence Tree – Mari Ostendorf (Boston University)

Abstract
In speech recognition, independence assumptions are typically made to reduce the complexity of the training and recognition search problems. One of the more blatantly invalid assumptions is that acoustic observations of phonemes are generated independently; i.e., there is no notion that an “aa” and an “ae” in the same utterance have something in common because they came from the same vocal tract. Vocal tract normalization and unsupervised adaptation compensate for this problem to some extent, but existing algorithms do not take full advantage of the predictive power that observations from one phone have for another phone. In this talk, we will present a new model that provides a practical formalism for representing intra-utterance correlation of phones (or other sub-word units) using Markov assumptions on a discrete, hidden dependence tree. The dependence tree models the phone “state” of an utterance, which is a vector of indices mapping to one of several possible mixture modes of a phone model. The dependence tree state is hidden in the same sense that an HMM mixture mode is hidden; observations are continuous-valued cepstral features described by Gaussian distributions conditioned on the hidden state. The talk will describe algorithms for constructing dependence tree topologies and Gaussian mixture parameter estimation, with experimental results on the Switchboard corpus using the dependence tree as a separate knowledge source for N-best rescoring. Extensions of the dependence tree model and implications for adaptation will be discussed.

Center for Language and Speech Processing