Nobuaki Minematsu (University of Tokyo) “How Can Speech Technologies Support Learners to Improve Their Skills of Speaking, Listening, Conversation and More?”
3400 N Charles St
Baltimore, MD 21218
In the globalization era, not only students but also immigrant workers have to learn new languages for smooth oral communication in those languages. In this talk, the lecturer illustrates how speech technologies, i.e. speech synthesis, speech recognition, voice conversion, etc can support learners to improve their skills of speaking, listening, conversation, and more. Text does not show any prosodic structure explicitly and native speakers use their implicit knowledge on prosodic control to read aloud that text naturally. Implicit knowledge is difficult for teachers to explain explicitly and therefore prosody training is rare in classrooms. Text-to-speech systems often use a text-based prosody prediction module and this module is used effectively to teach prosodic control required to read given texts aloud explicitly to learners. In High Variability Phonetic Training (HVPT), teachers use speech stimuli with different ages, genders, accents, background noises, etc. Being exposed to those variabilities, learners can obtain robust listening skills. However, teachers prepare those stimuli manually. By introducing speech analysis and voice conversion techniques, those variabilities are easily enhanced. In the talk, an interesting example of adversarial training, which was originally used for machine learners and is newly introduced to human learners, and its effectiveness for acquiring robust listening skills are explained. Further, use of speech recognition technologies for shadowing assessment to improve parallel processing skills for conversation is described. In the lecturer’s laboratory, a new project has started to realize a novel speech assessment framework, where not native-likeness but comprehensibility of learners’ speech is mainly focused on for assessment. The lecturer shows recently obtained results of objective measurement of comprehensibility of learners’ speech.
Nobuaki MINEMATSU earned the doctor of Engineering in 1995 from UTokyo and since 2012, he has been a professor there. From 2002 to 2003, he was a visiting researcher at KTH, Sweden. He has a wide interest in speech communication covering speech science and speech engineering, especially he has an expert knowledge on Computer-Aided Language Learning (CALL). When he was a high-school student, he wanted to be a teacher of English, and when he was a university student, he was an amateur actor on English stages. He has published more than 450 journal and conference papers and received paper awards from RISP, JSAI, ICIST, O-COCOSDA, IEICE and an encouragement award from PSJ. He gave tutorial and invited talks on CALL at conferences such as APSIPA2011, INTERSPEECH2012, O-COCOSDA2014, and CASTEL/J2017. He was a distinguished lecturer of APSIPA from 2015 to 2016. He served as secretary of Speech Prosody 2004 and INTERSPEECH2010, co-organizer of SLaTE2010, and program chair of O-COCOSDA2018. He is the general chair of Speech Prosody 2020.