Decoding time set by neuronal oscillations locked to the input rhythm: a neglected cortical dimension in models of speech perception
Oded Ghitza, Hearing Research Center & Center for BioDynamics, Boston University
October 4, 2011
Speech is an inherently rhythmic phenomenon in which the acoustic signal is transmitted in syllabic "packets" and temporally structured so that most of the energy fluctuations occur in the range between 3 and 10 Hz. The premise of our approach is that this rhythmic property reflects some fundamental property, one internal to the brain. We suggest that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Nested neuronal oscillations in the theta, beta and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these neuronal oscillations remain phase-locked to the auditory input rhythm. A model (Tempo) is presented which seems capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of syllabic rate (Ghitza & Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when silence gaps are inserted in between successive 40- ms long compressed-signal intervals -- a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture. In my talk I will present the architecture of Tempo and discuss the implications of the new dimensions of the model seem necessary to account for the Ghitza & Greenberg data.Reading material: Ghitza, O. and Greenberg, S. (2009). "On the possible role of brain rhythms in speech perception: Intelligibility of time compressed speech with periodic and aperiodic insertions of silence." Phonetica 66:113--126. doi:10.1159/000208934 Ghitza, O. (2011). "Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm." Front. Psychology 2:130. doi: 10.3389/fpsyg.2011.00130
ODED GHITZA received the B.Sc., M.Sc. and Ph.D. degrees in Electrical Engineering from Tel-Aviv University, Israel, in 1975, 1977 and 1983, respectively. From 1968 to 1984 he was with the Signal Corps Research Laboratory of the Israeli Defense Forces. During 1984-1985 he was a Bantrell post-doctoral fellow at MIT, Cambridge, Massachusetts, and a consultant with the Speech Systems Technology Group at Lincoln Laboratory, Lexington, Massachusetts. From 1985 to early 2003 he was with the Acoustics and Speech Research Department, Bell Laboratories, Murray Hill, New Jersey, where his research was aimed at developing models of hearing and at creating perception based signal analysis methods for speech recognition, coding and evaluation. From early 2003 to early 2011 he was with Sensimetrics Corp., Malden, Massachusetts, where he continued to model basic knowledge of auditory physiology and of perception for the purpose of advancing speech, audio and hearing-aid technology. From 2005 to 2008 he was with the Sensory Communication Group at MIT. Since mid 2006 he is with the Hearing Research Center and with the Center for Biodynamics at Boston University, where he studies the role of brain rhythms in speech perception.