Title: Distant-talk speech processing toward natural conversation understanding


Speech recognition and understanding technologies have been rapidly developed in a last decade, and showing significant improvements due to deep learning. With this success, the research trends are moving to more challenging and useful scenarios including distant-talk speech processing, which enables natural conversation understanding by a machine. In this scenario, speech captured by microphones are significantly distorted by attenuations, noises, and reverberations, which drastically degrades speech recognition performance. This talk covers our recent activities to tackle these issues, including the organization of speech separation and recognition (CHiME) challenge series and our successful challenge systems (2nd place among 26 submissions on CHiME-3 and 3rd place among 16 submissions on CHiME-4). The systems combine state-of-the-art techniques in microphone array, source separation, as well as speech recognition, and finally achieve the performance comparable to close-talk clean speech recognition performance. The talk also describes a multichannel end-to-end speech processing to unify the above complicated systems with a single deep network architecture toward joint optimization of whole distant-talk speech processing.

Shinji Watanabe received the Dr. Eng. Degree in 2006 from Waseda University, Japan. From 2001 to 2011, He was working at NTT Communication Science Laboratories, Japan. From 2012, he has been working at Mitsubishi Electric Research Laboratories, USA. His research interests include machine learning, Bayesian inference, speech recognition, and spoken language processing. He is a member of the Acoustical Society of Japan (ASJ), the Institute of Electronics, Information and Communications Engineers (IEICE), and a senior member of IEEE. He has served an Associate Editor of the IEEE Transactions on Audio Speech and Language Processing, and is currently several committee members including the IEEE Signal Processing Society Speech and Language Technical Committee (IEEE SLTC).

Recently, he is working on noise robust speech recognition for distant-talk scenarios, actively. He participated in CHiME1, CHiME2 track2, REVERB, and CHiME3 challenges, and his team placed 1st, 1st, 2nd, and 2nd winners, respectively. He also organized CHiME speech separation and recognition challenge series, and led “Far-Field Speech Enhancement and Recognition in Mismatched Settings” research group at 2015 Jelinek Summer Workshop on Speech and Language Technology as a senior team member. He is a tutorial speaker at Interspeech 2016 and APSIPA ASC 2016 about these topics

