Mari Ostendorf
Computer Science, Boston University
Title: "Modeling Intra-Utterance Phone Correlation Using A Hidden Dependence Tree"
**************************************************************************
In speech recognition, independence assumptions are typically made to
reduce the complexity of the training and recognition search problems.
One of the more blatantly invalid assumptions is that acoustic
observations of phonemes are generated independently; i.e., there is no
notion that an "aa" and an "ae" in the same utterance have something in
common because they came from the same vocal tract. Vocal tract
normalization and unsupervised adaptation compensate for this problem to
some extent, but existing algorithms do not take full advantage of the
predictive power that observations from one phone have for another
phone. In this talk, we will present a new model that provides a
practical formalism for representing intra-utterance correlation of
phones (or other sub-word units) using Markov assumptions on a discrete,
hidden dependence tree. The dependence tree models the phone "state" of
an utterance, which is a vector of indices mapping to one of several
possible mixture modes of a phone model. The dependence tree state is
hidden in the same sense that an HMM mixture mode is hidden; observations
are continuous-valued cepstral features described by Gaussian
distributions conditioned on the hidden state. The talk will describe
algorithms for constructing dependence tree topologies and Gaussian
mixture parameter estimation, with experimental results on the
Switchboard corpus using the dependence tree as a separate knowledge
source for N-best rescoring. Extensions of the dependence tree model and
implications for adaptation will be discussed.
**************************************************************************
Seminar Schedule