Graham Neubig (Nara Institute of Science and Technology) — Simultaneous Speech Translation
Baltimore, MD, 21218
Speech translation is an application of machine translation that converts utterances from the speaker’s language into the listener’s language. One of the most identifying features of speech translation is the fact that it must be performed in real time while the speaker is speaking, and thus it is necessary to split a constant stream of words into translatable segments before starting the translation process. “Simultaneous speech translation” is a line of research that investigates how to perform this segmentation and translation with minimal delay, presenting the translation results to the user as soon as possible. However, because this entails potentially starting translation before the speaker has spoken the whole sentence, it is often necessary to translate before recieving a syntactically or semantically complete unit, and methods to maintain translation accuracy in these adversary conditions are necessary.
In this talk, I will present four major threads of work in simultaneous speech translation covering (1) segmentation strategies, which decide when it is appropriate to start translation, (2) prediction methods, which attempt to predict content that the user has not yet spoken, (3) rewording, which changes the standard way of wording output to make it more conducive to low-latency translation, and (4) evaluation, which attempts to make clear just how important speed and accuracy are in the simultaneous speech translation task.
Graham Neubig received his B.E. from University of Illinois, Urbana-Champaign, U.S.A, in 2005, and his M.E. and Ph.D. in informatics from Kyoto University, Kyoto, Japan in 2010 and 2012 respectively. He is currently an assistant professor at the Nara Institute of Science an Technology, Nara, Japan. His research interests include natural language and speech processing, with a focus on machine learning approaches for applications such as machine translation, spoken language analysis, spoken dialog, and syntactic/semantic parsing.