Sequence Kernels for Speaker and Speech Recognition – Mark Gales (University of Cambridge)
View Seminar Video
Abstract
Conceptually sequence kernels map variable length sequences into a fixed dimensional feature-space. In this feature space, for example, an inner-product can be computed. The ability to handle variable length sequences means that these kernels are suitable for speech signals which are by nature time varying. In the speech processing area, sequence kernels have been succesfully applied in speaker verification, where they are used in combination with support vector machines (SVMs) for classification. This talk will concentrate on a particular class of sequence kernels, generative kernels and how they can be used for speaker and speech recognition. Generative kernels, and score-spaces, make use of generative models such as hidden Markov models (HMMs) and Gaussian mixture models (GMMs). By taking first and higher-order derivatives of the log likelihood with respect to the model paarameters fixed dimenesional feature vectors can be extracted. An example of this form of kernel is the Fisher Kernel successfully applied to a range of biological sequences. The relationship of this form of kernel to schemes such as the GMM mean-Supervector kernel, commonly used in speaker verification, will be discussed. In addition, how these kernels and associated feature-spaces can be used for speech recognition and how they can handle speaker and environment changes will be looked at.
Biography
Mark Gales studied for the B.A. in Electrical and Information Sciences at the University of Cambridge from 1985-88. Following graduation he worked as a consultant at Roke Manor Research Ltd. In 1991 he took up a position as a Research Associate in the Speech Vision and Robotics group in the Engineering Department at Cambridge University. In 1995 he completed his doctoral thesis: Model-Based Techniques for Robust Speech Recognition supervised by Professor Steve Young. From 1995-1997 he was a Research Fellow at Emmanuel College Cambridge. He was then a Research Staff Member in the Speech group at the IBM T.J.Watson Research Center until 1999 when he returned to Cambridge University Engineering Department as a University Lecturer. He is currently a Reader in Information Engineering and a Fellow of Emmanuel College. Mark Gales is a Senior Member of the IEEE and was a member of the Speech Technical Committee from 2001-2004. He is currently an associate editor for IEEE Signal Processing Letters. Mark Gales was awarded a 1997 IEEE Young Author Paper Award for his paper on Parallel Model Combination and a 2002 IEEE Paper Award for his paper on Semi-Tied Covariance Matrices.