OUCH (Outing Unfortunate Characteristics of HiddenMarkovModels) or What’s Wrong with Speech Recognition and What Can We Do About it? – Jordan Cohen (Spelamode)
View Seminar Video
Abstract
Speech recognition has become a critical part of the user interface in mobile, telephone, and other technology applications. However, current recognition systems consistently underperform their users’ and designers’ expectations.This talk reports on a project, OUCH, which investigates one aspect of the most commonly used speech recognition algorithms. In most Hidden Markov Model implementations, frame-to-frame independence is assumed by the model, but in fact the frame observations are not independent. This mismatch between the model assumptions and the data have been well known. Following work of Gillick and Wegmann, the OUCH project is measuring and cataloging some of the implications of these assumptions, using a procedure which does not fix the model, but rather which creates speech data which satisfies the model assumptions. (See Don’t Multiply Lightly: Quantifying Problems with the Acoustic Model Assumptions in Speech Recognition Dan Gillick, Larry Gillick, and Steven Wegmann, ASRU, 2011)In addition to our work in modeling, we are surveying the field using a snowball technique to document how the researchers and engineers in speech and language technology view the current situation. This talk with review our modeling findings to date, and will offer a preliminary look at our survey.
Biography
Jordan Cohen is a group leader in the OUCH project at Berkeley, and founder and technologist at Spelamode Consulting. He was the principal investigator for GALE at SRI, the CTO of Voice Signal Technologies, the Director of Business Relations at Dragon, and a member of the research staff at IDA and IBM. Dr. Cohen assists companies with technology issues, and he is engaged in intellectual property evaluation and litigation.