Using speech models for separation in monaural and binaural contexts – Dan Ellis (Columbia University)
Abstract
When the number of sources exceeds the number of microphones, acoustic source separation is an underconstrained problem that must rely on additional constraints for solution. In a single-channel environment the expected behavior of the source — i.e. an acoustic model — is the only feasible basis for separation. I will describe our recent work in monaural speech separation based on fitting parametric “eigenvoice” speaker-adapted models to both voices in a mixture. In a binaural, reverberant environment, the interaural characteristics of an acoustic source exhibit structure that can be used to separate, even without prior knowledge of location or room characteristics. I will present MESSL, our EM-based system for source separation and localization. MESSL’s probabilistic foundation facilitates the incorporation of more specific source models; I will also describe MESSL-EV, which incorporates the eigenvoice speech models for improved binaural separation in reverberant environments. Joint work with Ron Weiss and Mike Mandel.
Biography
Daniel P. W. Ellis received the Ph.D. degree in electrical engineering from the Massachusetts Institute of Technology, Cambridge, where he was a Research Assistant in the Machine Listening Group of the Media Lab. He spent several years as a Research Scientist at the International Computer Science Institute, Berkeley, CA. Currently, he is an Associate Professor with the Electrical Engineering Department, Columbia University, New York. His Laboratory for Recognition and Organization of Speech and Audio (LabROSA) is concerned with all aspects of extracting high-level information from audio, including speech recognition, music description, and environmental sound processing. He also runs the AUDITORY email list of 1700 worldwide researchers in perception and cognition of sound.