Auto-Synchronous Analysis of Speech – Pascal Clark (Human Language Technology Center of Excellence)
It is well known that information is embedded in the speech signal as smooth variations over time and frequency. Since the 90’s, feature-extraction front-ends have routinely exploited this fact in the form of subband-modulation and spectro-temporal filtering. A key aspect of such methods is averaging over short time-scale structure to estimate smooth, long-term power envelopes. In this talk, I will argue that the short-term structure itself is useful when considered jointly with long-term envelopes. Toward this end, I propose replacing the time-worn concept of pitch, which is based on dubious assumptions of periodicity, with self-similar recurrence, which is statistically flexible and consistent with long-term coherences. Viewing speech in terms of recurrences suggests an intrinsic, stochastic timing reference for what I refer to as “auto-synchronization.” I will demonstrate how synchronous estimation is complementary to existing power envelopes, and asymptotically immune to interference from slowly-varying noise. My objective in this talk is to lay the groundwork for further experiments and practical development of robust speech features.
Pascal Clark is a post-doctoral researcher at the Johns Hopkins Human Language Technology Center of Excellence. His current work focuses on signal processing for speech applications, including detection of speech in noise, and stochastic modeling for invariances in speech. Prior to joining the HLTCOE, he received his Ph.D. at the University of Washington, where he was also an author of the Modulation Toolbox.