The manifold advantages of articulatory representations, including microphone and speaker normalization:
John Hogden
- 08/05/2002
- Location: Shaffer Hall, Room 101
- Time: 2:30 pm - 3:30 pm
- Abstract:
A new acoustic model, Maximum Likelihood Continuity Mapping (MALCOM), will be presented. MALCOM generates a stochastic model of speech assuming 1) that speech sounds are periodically emitted as a point moves smoothly through a low-dimensional space called a continuity map (CM), and 2) that the sound emitted at time t is probabilistic function of the position of the point at time t. The assumptions underlying MALCOM are intended to mimic speech production in that 1) speech sounds are produced as the articulators move slowly through a low-dimensional articulator space, and 2) the speech sound produced at time t is a function of the articulator positions at time t. MALCOM's smoothness constraint implies that MALCOM uses much more temporal context than typical Markov models.
The parameters required by MALCOM constitute an estimate of the mapping between articulation and acoustics. Surprisingly, no articulator measurements are required for training. To make the point that MALCOM is able to invert very general nonlinear functions, including microphone nonlinearities and speaker differences, to recover articulator positions, we will discuss a mathematical proof, simulation results, and experimental evidence using simultaneously collected acoustic and articulator measurements. We conclude that the articulator positions recovered by MALCOM should provide a better basis for speech recognition than mel-cepstra (or other commonly used acoustic parameters) in that they are relatively invariant to microphone effects, speaker differences, and can convey the same information using fewer dimensions - suggesting that they will be less affected by acoustic noise. MALCOM should also be applicable to characterizing speaker differences, and so may be useful for speaker recognition.
|