Bill Byrne
November 24th
4:30PM
"Hierarchical Phrase-based Translation with Weighted Finite State Transducers "
Using speech models for separation in monaural and binaural contexts
Dan Ellis - October 13th, 2009
Columbia University
Abstract
When the number of sources exceeds the number of microphones, acoustic
source separation is an underconstrained problem that must rely on
additional constraints for solution. In a single-channel environment
the expected behavior of the source -- i.e. an acoustic model -- is
the only feasible basis for separation. I will describe our recent
work in monaural speech separation based on fitting parametric
"eigenvoice" speaker-adapted models to both voices in a mixture.
In a binaural, reverberant environment, the interaural characteristics
of an acoustic source exhibit structure that can be used to separate,
even without prior knowledge of location or room characteristics. I
will present MESSL, our EM-based system for source separation and
localization. MESSL's probabilistic foundation facilitates the
incorporation of more specific source models; I will also describe
MESSL-EV, which incorporates the eigenvoice speech models for improved
binaural separation in reverberant environments.
Joint work with Ron Weiss and Mike Mandel.
Biography
Daniel P. W. Ellis received the Ph.D. degree in electrical engineering
from the Massachusetts Institute of Technology, Cambridge, where he
was a Research Assistant in the Machine Listening Group of the Media
Lab. He spent several years as a Research Scientist at the
International Computer Science Institute, Berkeley, CA. Currently, he
is an Associate Professor with the Electrical Engineering Department,
Columbia University, New York. His Laboratory for Recognition and
Organization of Speech and Audio (LabROSA) is concerned with all
aspects of extracting high-level information from audio, including
speech recognition, music description, and environmental sound
processing. He also runs the AUDITORY email list of 1700 worldwide
researchers in perception and cognition of sound.
Abstract
When the number of sources exceeds the number of microphones, acoustic source separation is an underconstrained problem that must rely on additional constraints for solution. In a single-channel environment the expected behavior of the source -- i.e. an acoustic model -- is the only feasible basis for separation. I will describe our recent work in monaural speech separation based on fitting parametric "eigenvoice" speaker-adapted models to both voices in a mixture. In a binaural, reverberant environment, the interaural characteristics of an acoustic source exhibit structure that can be used to separate, even without prior knowledge of location or room characteristics. I will present MESSL, our EM-based system for source separation and localization. MESSL's probabilistic foundation facilitates the incorporation of more specific source models; I will also describe MESSL-EV, which incorporates the eigenvoice speech models for improved binaural separation in reverberant environments. Joint work with Ron Weiss and Mike Mandel.
Biography
Daniel P. W. Ellis received the Ph.D. degree in electrical engineering from the Massachusetts Institute of Technology, Cambridge, where he was a Research Assistant in the Machine Listening Group of the Media Lab. He spent several years as a Research Scientist at the International Computer Science Institute, Berkeley, CA. Currently, he is an Associate Professor with the Electrical Engineering Department, Columbia University, New York. His Laboratory for Recognition and Organization of Speech and Audio (LabROSA) is concerned with all aspects of extracting high-level information from audio, including speech recognition, music description, and environmental sound processing. He also runs the AUDITORY email list of 1700 worldwide researchers in perception and cognition of sound.


