Discriminative Estimation of Mixtures of Exponential Distributions – Vaibhava Goel (IBM)
View Seminar Video
Abstract
An auxiliary function based approach for estimation of exponential model parameters under a maximum conditional likelihood (MCL) objective was recently proposed by Gunawardana and Byrne. While for Gaussian mixture models it leads to parameter updates that were known previously, it is a very useful method in that it is applicable to arbitrarily constrained exponential models and the resulting auxiliary function is similar to the EM auxiliary function, thus eliminating the need for two separate optimization procedures. It is also easily extensible to other utility functions that are similar to MCL, such as sum-of-posteriors and maximum mutual information. One shortcoming of this approach, however, is that the validity of the auxiliary function is not rigorously established. In this talk I will present our work on discriminative estimation using the auxiliary function approach. I’ll first discuss our recent proof of validity of the auxiliary function, and then present application of this approach for discriminative estimation of subspace constrained Gaussian mixture models (SCGMMs), where the exponential model weights of all Gaussians are required to belong to a common subspace. SCGMMs have been shown to generalize and yield significant error rate reductions over previously considered model classes such as diagonal models, models with semi-tied covariances, and extended maximum likelihood linear transformation (EMLLT) models. We find that MMI estimation of SCGMMs (tried on a digit task so far) results in more than 20% relative reduction in word error rate over maximum likelihood estimation. Time permitting, I’ll also discuss MCL estimation of language models that combine N-grams and stochastic finite state grammars. This work was done in collaboration with Scott Axelrod, Ramesh Gopinath, Peder Olsen, and Karthik Visweswariah.