Mei-Yuh Hwang (Mobvoi AI Lab) “Codemix Speech Recognition at Mobvoi: Current and Future”
3400 N. Charles Street
In this talk, I will focus on the recent efforts at ASRU 2019, investigating speech recognition for Mandarin and English mixed speech. With burgeoning advancements in transportation and communication, many cultures find themselves becoming intertwined. Given that English has become the default global communication language, codemix with English is a common phenomenon for non-English spoken languages. While it might be easy for humans to understand codemixed utterances, the recognition accuracy by machines still suffers greatly when the spoken language changes in the middle of an utterance.
In this ASRU effort, researchers aim at improving recognition accuracy from the traditional HMM-DNN hybrid approach to the recent end-to-end methodology, which gets rid of phonetic pronunciation dictionaries completely. This exercise gives the community a common ground to understand where we are now and what to do next, at least for Mandarin-English codemix. More research and more data collection are needed to achieve comparable performance to monolingual recognition. At Mobvoi, we hope this study will allow us to develop a satisfactory speech input for scenarios where two or more spoken languages are mixed, such as Cantonese, Mandarin and English, as commonly seen in areas like Hong Kong.
Mei-Yuh received her PhD in Computer Science from Carnegie Mellon University in 1993 and worked at Microsoft Seattle and China for 18 years, publishing numerous conference and journal papers, and more importantly delivering industry-standard products in speech recognition, Bing machine translation, and Cortana language understanding. She is an IEEE fellow, and is passionate in bridging the gap between academia and industry. She is currently the director of Mobvoi AI Lab in Seattle WA.