Speech Recognition with Segmental Conditional Random Fields

The goal of this workshop group is to advance the state-of-the-art in core speech recognition by developing new kinds of features for use in a Segmental Conditional Random Field (SCRF). The recently proposed SCRF approach [Zweig & Nguyen, 2009] generalizes Conditional Random Fields to operate at the segment level, rather than at the traditional frame level. Basic to the approach, every segment is labeled directly with a word. Then, features are extracted which each measure some form of consistency between the underlying audio and the word hypothesis for a segment. These are combined in a log-linear model to produce the posterior probability of a word sequence given the audio. Previous work has used features based on the detection of phoneme and multi-phone units in the audio input. For example, one feature is the edit distance between the observed phoneme sequence in a segment, and that expected based on the hypothesis. The log-linear model embodied by the SCRF has the key advantage of being able to combine numerous, possibly redundant features in a coherent way; thus we have a very convenient way of improving performance by adding large numbers of complementary features.

The work being done in the workshop revolves around extracting new acoustic features that can leverage the segmental approach. Professor Van Compernolle and Dr. Demuynck from Leuven University in Belgium are extending previous work in template based ASR [Wachter et al. 2007, Demange & Van Compernolle 2009] to find highly informative features based on template matching. A second line of research revolves around the use of coherent modulation features [Clark & Atlas 2009], and is being explored by Prof. Les Atlas from the University of Washington, and his student Pascal Clark. Professor Fei Sha and his student Meihong Wang, from the University of Southern California, are studying the use of deep-learning based features . Finally, Dr. Geoffrey Zweig and Dr. Patrick Nguyen from Microsoft Research are working on integrating these and other features into the SCARF toolkit for segmental CRF based speech recognition.

Abstract
Final Report
Final Presentation
Final Presentation Video

Team Members
Senior Members
Damianos Karakos CLSP
Les Atlas University of Washington
Kris Demuynck University of Leuven
Patrick Nguyen Microsoft
Fei Sha University of Southern California
Dirk van Compernolle University of Leuven
Geoffrey Zweig Microsoft
Graduate Students
Samuel Thomas CLSP
Pascal Clark University of Washington
Gregory Sell Stanford University
Meihong Wang University of Southern California
Undergraduate Students
Samuel Bowman University of Chicago
Justine Kao Stanford University
Affiliate Members
Hynek Hermansky CLSP

Center for Language and Speech Processing