Joint discriminative language modeling and utterance classification – Brian Roark (OGI)
View Seminar Video
Abstract
In this talk, I will describe several discriminative language modeling techniques for large vocabulary automatic speech recognition (ASR) tasks. I will initially review recent work on n-gram model estimation using the perceptron algorithm and conditional random fields, with experimental results on Switchboard (joint work with Murat Saraclar, Michael Collins and Mark Johnson). I will then present some new work on a call-classification task, for which training utterance classes are annotated along with the reference transcription. We demonstrate that a joint modeling approach, using utterance-class, n-gram, and class/n-gram features, reduces WER significantly over just using n-gram features, while additionally providing significantly more accurate utterance classification than the baselines. A variety of parameter update approaches will be discussed and evaluated with respect to both WER and classification error rate reduction, including simultaneous and independent optimization. As with the earlier n-gram modeling approaches, the resulting models are encoded as weighted finite-state automata and used by simply intersecting with word-lattices output from the baseline recognizer (joint work with Murat Saraclar).