A Multinomial View of Signal Spectra for Latent-Variable Analyses – Bhiksha Raj (MERL Research Lab)

View Seminar Video
The magnitude spectrum of any signal may be viewed as a density function or (in the case of discrete frequency spectra) histograms with the frequency axis as the support. In this talk I will describe how this perspective allows us to perform spectral decompositions through a latent-variable model that enables us to extract underlying, or “latent”, spectral structures that additively compose the speech spectrum. I show how such decomposition can be used for varied purposes such as bandwidth expansion of narrow-band speech, component separation from mixed monaural signals, and denoising. I then explain how the basic latent-variable model may be extended to derive sparse overcomplete decompositions of speech spectra. I demonstrate through examples that such decompositions can not only be utilized for improved speaker separation from mixed monaural recordings, but also to extract the building blocks of other data such as images and text. Finally, I present shift- and transform-independent extensions of the model, through which it becomes possible to automatically extract repeating themes or objects within data such as audio, images or video.

Center for Language and Speech Processing