Data-driven discriminative features for HMM-based ASR – Hynek Hermansky (Oregon Graduate Institute)

March 12, 2002 all-day

The talk describes our work towards data-driven features that could be used with the current HMM system and that would represent transformed posterior probabilities of the sub-word classes. To address steady or slowly-varying artifacts, the probabilities are derived from relatively long time spans of the signal (up to 1 sec). This may also alleviate some dependencies on the phonetic context. To address excessive sensitivity of ASR to changes in short-term spectral profiles, we do the probability estimations in two steps. The first step yields frequency-localized class probability estimates. These estimates are used as inputs to another probability estimator that yields the final class probabilities. These final probabilities are appropriately transformed to yield features for the subsequent HMM classifier. The whole feature module is trained on labeled speech data.

Hynek Hermansky is a Professor of Electrical and Computer Engineering and Director of Center for Information Technology at the OGI School of Oregon Health and Sciences University in Portland, Oregon, and a Senior Research Scientist at the International Computer Science Institute in Berkeley, California. He has been working in speech processing for over 25 years, previously as a research fellow at the University of Tokyo, a Research Engineer at Panasonic Technologies in Santa Barbara, California, and as a Senior Member of Research Staff at U S WEST Advanced Technologies. He is a Fellow of IEEE, Member of the Board of the International Speech Communication Association, Editor of IEEE Transactions on Speech and Audio Processing, and a Member of the Editorial Board of Speech Communication. He holds Dr.Eng. degree from the University of Tokyo. His main research interests are in acoustic processing for speech and speaker recognition.

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing