Abstract:
Part I: Multiple Linear Transforms for Classification (joint work with Nagendra Goel, LSI Logic. This idea was presented at ICASSP 2001)
State-of-the-art speech recognition systems use Gaussian Mixture Models (GMMs) for HMM states. When the number
of Gaussians is large, computational, storage and data-sparsity considerations constrain us to use diagonal
covariance Gaussians. It is well known that a Maximum Likelihood Linear transformation (MLLT) of the data (which
approximately diagonalizes all the covariances) gives significant improvements in classification accuracy. In this
part of the talk I will introduce a generalization of the MLLT called MLT (multiple linear transforms) that allows us
to get close to the performance of a full-covariance GMMs while maintaining the storage and computational efficiency
of a diagonal covariance GMM. Experimental results on a car database (small-vocabulary grammar-based task)
show that significant improvements in performance are possible over an MLLT baseline system.
Part II: Enhancing GMM Scores Using SVM Hints (joint work with Shai Fine and Jiri Navratil. This idea will be presented at Eurospeech 2001)
On binary classification problems GMM (generative) and SVM (discriminative) classifiers with roughly the same level of performance
can sometimes produce uncorrelated errors. This fact is exploited to enhance a baseline multi-class GMM classifier with hints from
a binary SVM classifier. The SVM classifier is invoked only on speech frames where the GMM is uncertain and the SVM
decision is used to "nudge" the GMM scores for pairs of confusable classes. The utility of this technique is demostrated
on text-independent speaker identification and verification tasks. Significant improvements in accuracy over the GMM baseline
are possible without much computational overhead.
Papers describing both talks (and more) can be obtained from http://www.research.ibm.com/people/r/rameshg/.