Bipartite Graph Factorization in Static Decoding Graphs with Long-Span Acoustic Context: An Interesting Combinatorial Problem in ASR

Geoffrey Zweig, IBM TJ Watson Research

April 5, 2005


Presentation Slides

View Seminar Video

Abstract

A key problem in large vocabulary speech recognition is how to search for the word sequence with the highest likelihood, given an acoustic model, a language model, and some input audio data. There are two standard approaches to doing this: 1_RP_ to construct the search space “on-demand†so as to represent just the portions that are reasonably likely given the data, and 2_RP_ to construct ahead-of-time a full representation of the entire search space. This paper identifies and solves a problem that arises in the construction of a full representation of the search space when long span acoustic context is used in the acoustic model, specifically when the expected acoustic realization of a word depends on the identity of the preceding word. In this case, a sub-portion of the search space contains a bipartite graph with O_LP_V_RP_ vertices and O_LP_V^2_RP_ edges, where V is the vocabulary size. For large vocabulary systems, the number of edges is prohibitive, and we tackle the problem of finding an edge-wise minimal representation of this sub-graph. This is done by identifying complete bipartite sub-graphs within the graph, and replacing the edges of each such sub-graph with an extra vertex and edges that connect the left and right sides of the sub-graph to the new vertex. The problem of finding the smallest such representation is NP-hard, and we present a heuristic for finding a reasonable answer. The talk concludes with some experimental results on a large-vocabulary speech recognition system and a discussion of related problems.

Biography

Geoffrey Zweig received his PhD in Computer Science from the University of California at Berkeley in 1998, after which he joined IBM at the T.J. Watson Research Center. At IBM, Geoffrey manages the advanced large vocabulary speech recognition research group. His responsibilities include the development of improved acoustic modeling techniques and state-of-the-art decoding algorithms. Geoffrey was a PI for the DARPA EARS program and organized IBM’s participation in the 2003 and 2004 evaluation. His research interests revolve around the application of machine learning techniques such as boosting and Bayesian Network modeling to speech recognition, as well as highly practical applications such as the automated construction of grammars for directory dialer applications. In addition to speech recognition, Geoffrey has worked on a wide variety of topics, including extremely large scale document clustering for the web, and DNA physical mapping. Geoffrey is a member of the IEEE and an associate editor of the IEEE Transactions on Speech and Audio Processing.