Making Computers Good Listeners – Joseph Keshet (TTI Chicago)

October 2, 2012 all-day

View Seminar Video
A typical problem in speech and language processing has a very large number of training examples, is sequential, highly structured, and has a unique measure of performance, such as the word error rate in speech recognition, or the BLEU score in machine translation. The simple binary classification problem typically explored in machine learning is no longer adequate for the complex decision problems encountered in speech and language applications. Binary classifiers cannot handle the sequential nature of these problems, and are designed to minimize the zero-one loss, i.e., correct or incorrect, rather than the desired measure of performance.In addition, the current state-of-the-art models in speech and language processing are generative models that capture some temporal dependencies, such as Hidden Markov Models (HMMs). While such models have been immensely important in the development of accurate large-scale speech processing applications, and in speech recognition in particular, theoretical and experimental evidence have led to a wide-spread belief that such models have nearly reached a performance ceiling.In this talk, I first present a new theorem stating that a general learning update rule directly corresponds to the gradient of the desired measure of performance. I present a new algorithm for phoneme-to-speech alignment based on this update rule, which surpasses all previously reported results on a standard benchmark. I show a generalization of the theorem to training non-linear models such as HMMs, and present empirical results on phoneme recognition task which surpass results from HMMs trained with all other training techniques.I will then present the problem of automatic voice onset time (VOT) measurement, one of the most important variables measured in phonetic research and medical speech analysis. I will present a learning algorithm for VOT measurement which outperforms previous work and performs near human inter-judge reliability. I will discuss the algorithm’s implications for tele-monitoring of Parkinson’s disease, and for predicting the effectiveness of chemo-radiotherapy treatment of head and neck cancer.
Joseph Keshet received his B.Sc. and M.Sc. degrees in Electrical Engineering in 1994 and 2002, respectively, from Tel Aviv University. He received his Ph.D. in Computer Science from The School of Computer Science and Engineering at The Hebrew University of Jerusalem in 2007. From 1995 to 2002 he was a researcher at IDF, and won the prestigious Israeli award, “Israel Defense Prize”, for outstanding research and development achievements. From 2007 to 2009 he was a post-doctoral researcher at IDIAP Research Institute in Switzerland. From 2009 He is a research assistant professor at TTI-Chicago, a philanthropically endowed academic computer science institute within the campus of university of Chicago. Dr. Keshet’s research interests are in speech and language processing and machine learning. His current research focuses on the design, analysis and implementation of machine learning algorithms for the domain of speech and language processing.

Center for Language and Speech Processing