I-Vector Representation Based on GMM and DNN for Audio Classification – Najim Dehak (MIT – CSAIL)
Baltimore, MD 21218
USA
Abstract
The I-vector approach became the state of the art approach in several audio classification tasks such as speaker and language recognition. This approach consists of modeling and capturing all the different variability in the Gaussian Mixture Model (GMM) mean components between several audio recordings. More recently several subspace approaches had been extended on modeling the variability between the GMM weights rather than the GMM means. These last techniques such as Non-negative Factor Analysis (NFA) and Subspace Multinomial Model (SMM) needed to deal with the fact that the GMM weights are always positive and they should sum to one. In this talk, we will show how the NFA and SMM approaches or similar other subspaces approaches can be also used to model the hidden layer neuron activations on the deep neural network model for sequential data recognition task such as language and dialect recognition.
Biography
Najim Dehak received his Engineering degree in Artificial Intelligence in 2003 from Universite des Sciences et de la Technologie d’Oran, Algeria, and his MS degree in Pattern Recognition and Artificial Intelligence Applications in 2004 from the Universite de Pierre et Marie Curie, Paris, France. He obtained his Ph.D. degree from Ecole de Technologie Superieure (ETS), Montreal in 2009. During his Ph.D. studies he was also with Centre de recherche informatique de Montreal (CRIM), Canada. In the summer of 2008, he participated in the Johns Hopkins University, Center for Language and Speech Processing, Summer Workshop. During that time, he proposed a new system for speaker verification that uses factor analysis to extract speaker-specific features, thus paving the way for the development of the i-vector framework. Dr. Dehak is currently a research scientist in the Spoken Language Systems (SLS) Group at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research interests are in machine learning approaches applied to speech processing and speaker modeling. The current focus of his research involves extending the concept of an i-vector representation into other audio classification problems, such as speaker diarization, language- and emotion-recognition.