Robust HMM Estimation with Gaussian Merging-Splitting and Transform-based Adaptation

Ananth Sankar, SRI International

April 28, 1998


Abstract

We present a detailed experimental study of Gaussian splitting and merging algorithms to train the parameters of state-clustered hidden Markov model (HMM) automatic speech recognition (ASR) systems. Gaussian splitting uniformly distributes the training data into the model parameters, and gives very different estimates from SRI's previous training algorithm. However, it does not significantly alter recognition performance. Gaussian merging gives robust parameter estimates that is found to be critical for both speaker-independent and speaker-adaptive recognition. A combination of these techniques, the Gaussian Merging-Splitting (GMS) algorithm, is then used to explore a variety of HMM structures. For a fixed number of Gaussian parameters, it is found that decreasing the number of state clusters while increasing the number of Gaussians per cluster gives better performance using the GMS algorithm.

However, to robustly estimate systems with a large number of state clusters, we propose a model where the HMM with a larger number of state clusters is a transformed version of an HMM with a smaller number of clusters. A set of transforms is used for each of the state clusters in the larger system, and the transforms are trained using maximum-likelihood estimation. Experimental results show that this method gives superior performance to the GMS algorithm.