Dynamic Finite-State Transducer Composition with Look-Ahead for Very-Large Scale Speech Recognition – Mike Riley (Google)

March 30, 2010 all-day

This talk describes a weighted finite-state transducer composition algorithm that generalizes the concept of the composition filter and presents look-ahead filters that remove useless epsilon paths and push forward labels and weights along epsilon paths. This filtering permits the composition of very large speech recognition context-dependent lexicons and language models much more efficiently in time and space than previously possible. We present experiments on Broadcast News and a spoken query task that demonstrate a 5-10% overhead for dynamic, runtime composition compared to a static, offline composition of the recognition transducer in an FST-based decoder. In the spoken query task, we give results using LMs varying from 15M to 2G ngrams. To our knowledge, this is the first such system with so little overhead and such large LMs.Joint work with: Cyril Allauzen, Ciprian Chelba, Boulos Harb, and Johan Schalkwyk.
Michael Riley has a B.S., M.S., and Ph.D from MIT, all in computer science. He began his career at Bell Labs and AT&T Labs where he, together with Mehryar Mohri and Fernando Pereira, introduced and developed the theory and use of weighted finite-state transducers (WFSTs) in speech and language. He is currently a research scientist at Google, Inc. His interests include speech and natural language processing, machine learning, and information retrieval. He is a principal author of the OpenFst library and the AT&T FSM Library (TM).

Center for Language and Speech Processing