Speech Recognition with Segmental Conditional Random Fields

The goal of this workshop group is to advance the state-of-the-art in core speech recognition by developing new kinds of features for use in a Segmental Conditional Random Field (SCRF). The recently proposed SCRF approach [Zweig & Nguyen, 2009] generalizes Conditional Random Fields to operate at the segment level, rather than at the traditional frame level. Basic to the approach, every segment is labeled directly with a word. Then, features are extracted which each measure some form of consistency between the underlying audio and the word hypothesis for a segment. These are combined in a log-linear model to produce the posterior probability of a word sequence given the audio. Previous work has used features based on the detection of phoneme and multi-phone units in the audio input. For example, one feature is the edit distance between the observed phoneme sequence in a segment, and that expected based on the hypothesis. The log-linear model embodied by the SCRF has the key advantage of being able to combine numerous, possibly redundant features in a coherent way; thus we have a very convenient way of improving performance by adding large numbers of complementary features.

The work being done in the workshop revolves around extracting new acoustic features that can leverage the segmental approach. Professor Van Compernolle and Dr. Demuynck from Leuven University in Belgium are extending previous work in template based ASR [Wachter et al. 2007, Demange & Van Compernolle 2009] to find highly informative features based on template matching. A second line of research revolves around the use of coherent modulation features [Clark & Atlas 2009], and is being explored by Prof. Les Atlas from the University of Washington, and his student Pascal Clark. Professor Fei Sha and his student Meihong Wang, from the University of Southern California, are studying the use of deep-learning based features . Finally, Dr. Geoffrey Zweig and Dr. Patrick Nguyen from Microsoft Research are working on integrating these and other features into the SCARF toolkit for segmental CRF based speech recognition.

Abstract

Final Report

Final Presentation | Video

Final Presentation Video

Team Members

Senior Members

Damianos KarakosCLSP
Les AtlasUniversity of Washington
Kris DemuynckUniversity of Leuven
Patrick NguyenMicrosoft
Fei ShaUniversity of Southern California
Dirk van CompernolleUniversity of Leuven
Geoffrey ZweigMicrosoft

Graduate Students

Samuel ThomasCLSP
Pascal ClarkUniversity of Washington
Gregory SellStanford University
Meihong WangUniversity of Southern California

Undergraduate Students

Samuel BowmanUniversity of Chicago
Justine KaoStanford University

Affiliate Members

Hynek HermanskyCLSP