Manifold Constrained Deep Neural Networks for ASR – Richard Rose (McGill University)
View Seminar Video
View Presentation Slides
Abstract
This presentation investigates the application of manifold learning approaches to acoustic modeling in automatic speech recognition (ASR). Acoustic models in ASR are defined over high dimensional feature vectors which can be represented by a graph with nodes corresponding to the feature vectors and weights describing the local relationships between feature vectors. This representation underlies manifold learning approaches which assume that high dimensional feature representations lie on a low dimensional imbedded manifold. A manifold based regularization framework is presented for deep neural network (DNN) training of tandem bottle-neck feature extraction networks for ASR. It is argued that this framework has the effect of preserving the underlying low dimensional manifold based relationships that exists among speech feature vectors within the hidden layers of the DNN. This is achieved by imposing manifold based locality preserving constraints on the outputs of the network. The ASR word error rates obtained using these networks is evaluated for speech in noise tasks and compared to that obtained using DNN bottle-neck networks trained without manifold constraints.
All Participant Lectures will be held in Room S1, 4th Floor.
Biography
Received B.S. and M.S. degrees from the Electrical and Computer Engineering Department at the University of Illinois, obtained a Ph.D. E.E. degree from what is now known as the The Center for Signal and Image Processing (CSIP) at the Georgia Institute of Technology in 1988 with a thesis in speech coding and speech analysis. From 1980 to 1984, he was with Bell Laboratories, now a division of Lucent Technologies working on signal processing and digital switching systems. From 1988 to 1992, he was a member of the Speech Systems and Technology group, now called the Information Systems Technology Group, at MIT Lincoln Laboratory working on speech recognition and speaker recognition. I was with AT&T from 1992 to 2003, specifically in the Speech and Image Processing Services Laboratory at AT&T Labs – Research in Florham Park, NJ after 1996. Currently, he is an associate professor of Electrical and Computer Engineering at McGill University in Montreal, Quebec.
An IEEE Fellow, served as a member of the IEEE Signal Processing Society Technical Committee on Digital Signal Processing from 1990 to 1995, and was on the organizing committee of the 1990 and 1992 DSP workshops. Also have served as an adjunct faculty member of the Georgia Institute of Technology, was elected as an at large member of the Board of Governers for the Signal Processing Society during the period from 1995 to 1997, and served as membership coordinator during that time. Pro Rose also spend the spring of 1996 at Furui Lab at NTT in Tokyo. Sadaoki Furui now has a laboratory at the Tokyo Institute of Technology, served as an associate editor for the IEEE Transactions on Speech and Audio Processing from 1997 to 1999, served as a member of the IEEE SPS Speech Technical Committee (STC) and was the founding editor of the STC Newsletter from 2002 through 2005. Also have served as an associate editor of the IEEE Transactions on Audio, Speech, and Language Processing