Multi-Rate and Variable-Rate Accoustic Modeling of Speech at Phone Syllable and Time Scales – Ozgur Cetin (ICSI/Berkeley)
In this talk we will describe a multi-rate extension of hidden Markov models HMMs, multi-rate coupled HMMs, and present their applications to acoustic modeling for speech recognition. Multi-rate HMMs are parsimonious models for stochastic processes that evolve at multiple time scales, using scale-based observation and state spaces. For speech recognition, we use multi-rate HMMs for joint acoustic modeling of speech at multiple time scales, complementing the usual short-term, phone-based representations of speech with wide modeling units and long-term temporal features. We consider two alternatives for the coarse scale in our multi-rate models, representing either phones, or syllable structure and lexical stress. We will also describe a variable-rate sampling extension to the basic multi-rate model, which tailors the analysis towards temporally fast-changing regions and significantly improves over fixed-rate sampling. Experiments on conversational telephone speech will be presented, showing that the proposed multi-rate approaches significantly improve recognition accuracy over HMM- and other coupled HMM-based approaches e.g. feature concatenation and multi-stream coupled HMMs for combining short- and long-term acoustic and linguistic information. This is a joint work with Mari Ostendorf of University of Washington.
Ozgur Cetin is a post-doctoral researcher at the International Computer Science Institute, Berkeley. He has received PhD and MS degrees from University of Washington, Seattle in 2005 and 2000, respectively, both in electrical engineering, and a BS degree from Bilkent University, Turkey in 1998 in electrical and electronics engineering. His research interests include machine learning, and speech and language processing.