Latent-Variable Representations for Speech Processing and Research – Miguel A. Carreira-Perpinan (Georgetown Institute for Computational and Cognitive Sciences, Georgetown University Medical Center)

March 13, 2001 all-day

View Seminar Video
Continuous latent variable models are probabilistic models that represent a distribution in a high-dimensional Euclidean space using a small number of continuous, latent variables. Examples include factor analysis, the generative topographic mapping (GTM) and independent factor analysis (ICA). This type of models is well suited for dimensionality reduction and sequential data reconstruction.
In the first part of this talk I will introduce the theory of continuous latent variable models and show an example of their application to the dimensionality reduction of electropalatographic (EPG) data.
In the second part I will present a new method for missing data reconstruction of sequential data that includes as a particular case the inversion of many-to-one mappings. The method is based on multiple pointwise reconstruction and constraint optimisation. Multiple pointwise reconstruction uses a Gaussian mixture joint density model for the data, conveniently implemented with a nonlinear continuous latent variable model (GTM). The modes of the conditional distribution of missing values given present values at each point in the sequence represent local candidate reconstructions. A global sequence reconstruction is obtained by efficiently optimising a constraint, such as continuity or smoothness, with dynamic programming. I derive two algorithms for exhaustive mode finding in Gaussian mixtures, based on gradient-quadratic search and fixed-point search, respectively; as well as estimates of error bars for each mode and a measure of distribution sparseness. I will demonstrate the method with synthetic data for a toy example and a robot arm inverse kinematics problem; and describe potential applications in speech, including the acoustic-to-articulatory mapping problem, audiovisual mappings for speech recognition and recognition of occluded speech.

Miguel A. Carreira-Perpinan is a postdoctoral fellow at the Georgetown Institute for Computational and Cognitive Sciences, Georgetown University Medical Center. He has university degrees in computer science and in physics (Technical University of Madrid, Spain, 1991) and a PhD in computer science (University of Sheffield, UK, 2001). In 1993-94 he worked at the European Space Agency in Darmstadt, Germany, on real-time simulation of satellite thermal subsystems. His current research interests are statistical pattern recognition and computational neuroscience.

Center for Language and Speech Processing