John Hershey (MERL) “Speech Separation by Deep Clustering: Towards Intelligent Audio Analysis and Understanding” @ Hackerman Hall B17
Mar 4 @ 12:00 pm – 1:15 pm


We address the problem of acoustic source separation in a deep learning framework we call “deep clustering.” Deep learning has recently produced major improvements in speech enhancement tasks in which the speech and interference belong to distinct classes of signal. In this case, a deep network classifier labels time-frequency regions of the signal according to the class of the dominant source, and separation is achieved by reconstructing the corresponding regions. However, such classification-based approaches completely fail to learn in “cocktail party” scenarios, where the interference is also speech. We present an alternative method that generates relation-preserving embedding vectors, one for each time-frequency region of the spectrogram, such that their distances represents the graph structure of the desired solution. For speech separation, the graph defines the segmentation of the spectrogram into regions corresponding to each source, and its representation is decoded by clustering the embeddings. The embedded representation is thus flexible with respect to the number of clusters and is invariant to their permutations. This method can be compared to spectral clustering, which uses simple kernel features to represent high-rank affinities and decodes them using expensive spectral methods. Deep clustering instead uses powerful learned features to represent low-rank affinities that can be decoded using simple clustering methods. We present experiments showing speaker-independent separation of single channel speech mixtures that yields an astounding 10 dB average improvement in SNR to both speech signals after training on 30 hours of speech data. Even more surprisingly, the same model trained only on two speaker mixtures can separate three-speaker mixtures, indicating an unusual degree of generalization. An audio demonstration of the results will be given and future directions will be discussed.


Prior to joining MERL in 2010, John spent 5 years at IBM’s T.J. Watson Research Center in New York, where he led a team in noise robust speech recognition. He also spent a year as a visiting researcher in the speech group at Microsoft Research, after obtaining his Ph D from UCSD in the area of multi-modal machine perception. He is currently working on machine learning for signal separation, speech recognition, language processing, and adaptive user interfaces.

Amanda Stent (Bloomberg) “Text Analytics in Finance: A Case Study and Some Considerations” @ Hackerman Hall B17
Oct 17 @ 12:00 pm – 1:15 pm


The finance industry increasingly seeks insight from unstructured data, including through text analytics. In this talk, I will give a brief survey of NLP as used in text analytics, then talk in detail about the NLP platform we are building at Bloomberg, including example applications. I will close with some ways in which NLP for financial text analytics is similar to and different from NLP as commonly done in research, and some ideas for productive NLP work.

Amanda Stent is a NLP architect at Bloomberg LP. Previously, she was a director of research and principal research scientist at Yahoo Labs, a principal member of technical staff at AT&T Labs – Research, and an associate professor in the Computer Science Department at Stony Brook University. Her research interests center on natural language processing and its applications, in particular topics related to text analytics, discourse and dialog. She holds a PhD in computer science from the University of Rochester. She is co-editor of the book Natural Language Generation in Interactive Systems (Cambridge University Press), has authored over 90 papers on natural language processing and is co-inventor on over twenty patents and patent applications. She is president emeritus of the ACL/ISCA Special Interest Group on Discourse and Dialog, treasurer of the ACL Special Interest Group on Natural Language Generation and one of the rotating editors of the journal Dialogue & Discourse. She is also a board member of CRA-W, where she co-edits the newsletter.

Center for Language and Speech Processing