Integrating Evidence Over Time: A Look at Conditional Models for Speech and Audio Processing – Eric Fosler-Lussier (Ohio State University)
View Seminar Video
Many acoustic events, particularly those associated with speech events, can be thought of as events in a rich descriptive subspace where the dimensions of the subspace can be thought of as a sort of decomposition of the original event space. In phonetic terms, we can think of how phonological features can be integrated to determine phonetic identity; for auditory scene analysis we can look how features like harmonic energy and cross-channel correlation come together to determine whether a particular frequency corresponds to target speech versus background noise. Some success has been achieved by thinking of these problems as probabilistic detection of acoustic (sub-)events. However, event detectors are typically local in nature, and need to be smoothed out by looking at neighboring events in time.In this talk, I describe current work in the Speech and Language Technologies Lab at OSU where we are looking at Conditional Random Fields models for both automatic speech recognition and computational auditory scene analysis problems. The talk will explore some of the successes and limitations of this log-linear method which integrates local evidence over time sequences.Joint work with Jeremy Morris, Ilana Heintz, Rohit Prabhavalkar, Zhaozhang Jin.
Eric Fosler-Lussier is currently an Assistant Professor of Computer Science and Engineering, with an adjunct appointment in Linguistics, at the Ohio State University. He received his Ph.D. in 1999 from the University of California, Berkeley, performing his dissertation research at the International Computer Science Institute under the tutelage of Prof. Nelson Morgan. He has also been a Member of Technical Staff at Bell Labs, Lucent Technologies, and a Visiting Researcher at Columbia University. He is generally interested in integrating linguistic insights as priors in statistical learning systems.