CLSP Homepage : Workshop Homepage
Workshop 2001
Dr. Ramesh Gopinath Wednesday, August 20, 2008


Jump To:

Seminar Information
Multiple Linear Transforms for Classification: Dr. Ramesh Gopinath - 07/18/2001
  • Download this Presentation in PDF (176 KB) or Postscript (341 KB)

  • Abstract:

    Part I: Multiple Linear Transforms for Classification (joint work with Nagendra Goel, LSI Logic. This idea was presented at ICASSP 2001)

    State-of-the-art speech recognition systems use Gaussian Mixture Models (GMMs) for HMM states. When the number of Gaussians is large, computational, storage and data-sparsity considerations constrain us to use diagonal covariance Gaussians. It is well known that a Maximum Likelihood Linear transformation (MLLT) of the data (which approximately diagonalizes all the covariances) gives significant improvements in classification accuracy. In this part of the talk I will introduce a generalization of the MLLT called MLT (multiple linear transforms) that allows us to get close to the performance of a full-covariance GMMs while maintaining the storage and computational efficiency of a diagonal covariance GMM. Experimental results on a car database (small-vocabulary grammar-based task) show that significant improvements in performance are possible over an MLLT baseline system.

    Part II: Enhancing GMM Scores Using SVM Hints (joint work with Shai Fine and Jiri Navratil. This idea will be presented at Eurospeech 2001)

    On binary classification problems GMM (generative) and SVM (discriminative) classifiers with roughly the same level of performance can sometimes produce uncorrelated errors. This fact is exploited to enhance a baseline multi-class GMM classifier with hints from a binary SVM classifier. The SVM classifier is invoked only on speech frames where the GMM is uncertain and the SVM decision is used to "nudge" the GMM scores for pairs of confusable classes. The utility of this technique is demostrated on text-independent speaker identification and verification tasks. Significant improvements in accuracy over the GMM baseline are possible without much computational overhead.

    Papers describing both talks (and more) can be obtained from http://www.research.ibm.com/people/r/rameshg/.

     

  • Biography:

    Dr. Gopinath has a Phd from Rice University and has been with the Speech Group at IBM T. J. Watson Research Center since March 1994. His primary interests are statistical learning, speech recognition and signal processing. He currently manages the research effort in acoustic and language modeling that supports the telephony and embedded speech recognition product offerings from IBM. Prior to this assignment he led the IBM Broadcast News Transcription team that participated in the NIST/DARPA BN evaluations from 1996-1999.




The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
*Telephone: (410) 516-4237 *Fax: (410) 516-5050 *E-mail: clsp@clsp.jhu.edu