Applications of weighted finite state transducers in a speech recognition toolkit – Daniel Povey (Microsoft Research)
View Seminar Video
The open-source speech recognition toolkit “Kaldi” uses weighted finite state transducer (WFSTs) for training and decoding, and uses the OpenFst toolkit as a C++ library. I will give an informal overview of WFSTs and of the standard AT&T recipe for WFST based decoding, and will mention some problems (in my opinion) with the basic recipe and how we addressed them while developing Kaldi. I will also describe how to use WFSTs to acheive “exact” lattice generation, in a sense will be explained. This is an interesting application of WFSTs because, unlike most WFST mechanisms, it does not have any obvious non-WFST analog.
Daniel Povey received his Bachelor’s (Natural Sciences, 1997), Master’s (Computer Speech and Language Processing, 1998) and PhD (Engineering, 2003) from Cambridge University. He is currently a researcher at Microsoft Research, Redmond, Washington, USA. From 2003 to 2008 he worked as a researcher in IBM Research in Yorktown Heights, NY. He is best known for his work on discriminative training for HMM-GMM based speech recognition (i.e. MMI, MPE, and their feature-space variants).