The Latest in DNN Research at IBM: DNN-based features, Low-Rank Matrices for Hybrid DNNs, and Convolutional Neural Networks – Tara Sainath (IBM Research)

April 19, 2013 all-day

View Seminar Video
Deep Neural Networks have become the state-of-the-art for acoustic modeling, showing gains between 10-30% relative compared to Gaussian Mixture Model/Hidden Markov Models . In this talk, I discuss how to improve the performance of these networks further. First, I present work on using these networks to extract NN-based features. I show that NN-based features offer between a 10-15% relative improvement on various LVCSR tasks compared to cross-entropy trained hybrid DNNs. Furthermore, NN-based features match the performance of sequence-trained hybrid DNNs while being 2x faster to train. I will also show that if a hybrid DNN is preferred, low-rank matrix factorization can also allow for a 50% reduction in parameters and a 2x speedup in training time. Second, I present work on Convolutional Neural Networks (CNNs), an alternative type of neural network that can be used to reduce spectral variations and model spectral correlations which exist in signals. Since speech signals exhibit both of these properties, CNNs are a more effective model for speech compared to DNNs. On a variety of LVCSR tasks, we find that CNN-based features offer an additional 4-12% improvement over DNN-based features.
Tara Sainath received her B.S (2004), M. Eng (2005) and PhD (2009) in Electrical Engineering and Computer Science all from MIT. The main focus of her PhD work was in acoustic modeling for noise robust speech recognition. She joined the Speech and Language Algorithms group at IBM T.J. Watson Research Center upon completion of her PhD. She organized a Special Session on Sparse Representations at Interspeech 2010, as well as a workshop on Deep Learning at ICML 2013. In addition, she has served as a staff reporter for the IEEE Speech and Language Processing Technical Committee (SLTC) Newsletter. She currently holds over 30 US patents. Her research interests are in acoustic modeling, including deep belief networks and sparse representations.

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing