Speech Recognition with Segmental Conditional Random Fields
The goal of this workshop group is to advance the state-of-the-art in core speech recognition by developing new kinds of features for use in a Segmental Conditional Random Field (SCRF). The recently proposed SCRF approach [Zweig & Nguyen, 2009] generalizes Conditional Random Fields to operate at the segment level, rather than at the traditional frame level. Basic to the approach, every segment is labeled directly with a word. Then, features are extracted which each measure some form of consistency between the underlying audio and the word hypothesis for a segment. These are combined in a log-linear model to produce the posterior probability of a word sequence given the audio. Previous work has used features based on the detection of phoneme and multi-phone units in the audio input. For example, one feature is the edit distance between the observed phoneme sequence in a segment, and that expected based on the hypothesis. The log-linear model embodied by the SCRF has the key advantage of being able to combine numerous, possibly redundant features in a coherent way; thus we have a very convenient way of improving performance by adding large numbers of complementary features.
The work being done in the workshop revolves around extracting new acoustic features that can leverage the segmental approach. Professor Van Compernolle and Dr. Demuynck from Leuven University in Belgium are extending previous work in template based ASR [Wachter et al. 2007, Demange & Van Compernolle 2009] to find highly informative features based on template matching. A second line of research revolves around the use of coherent modulation features [Clark & Atlas 2009], and is being explored by Prof. Les Atlas from the University of Washington, and his student Pascal Clark. Professor Fei Sha and his student Meihong Wang, from the University of Southern California, are studying the use of deep-learning based features . Finally, Dr. Geoffrey Zweig and Dr. Patrick Nguyen from Microsoft Research are working on integrating these and other features into the SCARF toolkit for segmental CRF based speech recognition.
Final Presentation | Video
Final Presentation Video
|Les Atlas||University of Washington|
|Kris Demuynck||University of Leuven|
|Fei Sha||University of Southern California|
|Dirk van Compernolle||University of Leuven|
|Pascal Clark||University of Washington|
|Gregory Sell||Stanford University|
|Meihong Wang||University of Southern California|
|Samuel Bowman||University of Chicago|
|Justine Kao||Stanford University|