Marc Delcroix (NTT): Target Speaker Extraction with SpeakerBeam
Automatic speech recognition has greatly progressed in recent years and is being used more and more in challenging conditions. However, the performance of current systems degrades when several people speak at the same time in conversations or when TV is on in the background. In this talk, we will present our recent investigation on target speaker extraction using SpeakerBeam. SpeakerBeam is a neural network that extracts speech from a target speaker out of a mixture of speech signals. SpeakerBeam uses a short adaptation utterance containing only the voice of the target speaker to compute the characteristics of his/her voice and then use these characteristics to adapt the neural network so that it can focus on extracting the voice of that target speaker.