Abstract:
The most important communication signal is human speech. It is useful to
think of speech communication in terms of Claude Shannon's information
theory channel model. When viewed as such, it soon becomes clear that the
most complex part of speech communication is in auditory system (the receiver).
In my opinion, relatively little is know about how the human auditory
system decodes speech. My research has studied this problem using simple
isolated natural consonant and vowel (CV) confusions, as a function of the
speech to noise ratio (SNR), with several types of masking noise. In one
type of experiment we selectively remove islands of speech, and then
correlate the resulting modified speech against subject scores. This
method has allowed us to isolate the information bearing portions of
the speech. Our most important conclusions to date are:
1) Across-frequency onset transient portion of the signal is typically
the most important.
2) The spectral regions of these transient are used to code different
consonants.
3) The frequency regions for a given consonant are correlated to the
following vowel.
4) compact spectral-temporal amplitude modulations components (e.g.,
a 10 Hz modulation) do not seem to play a significant role.
5) There is some evidence that frequency modulations may play a role,
but this remains unproven.
The above results are complemented with similar studies on hearing impaired
ears. Given cochlear damaged, speech scores are greatly reduced, even when
audibility is accounted for. The exact reasons for this SNR-loss presently
remain unclear, but I speculate that the source of this must be cochlear,
and related to nonlinear outer hair cell temporal processing.
Specifically, ``edge enhancement'' of the speech signal and forward
masking could easily be modified in such ears, leading to SNR-Loss. What
ever the reason, it is the key problem that needs to be fully researched.
Live demos will be played, including ``edge-enhanced'' speech signals, having a
greater robustness to noise.
Biography
Dr. Jont Allen received a BS in EE from the University of Illinois
in 1966, and PhD from the University of Pennsylvania in 1970. He then
joined Bell Laboratories in 1970, where he was in the Acoustics Research
Department as a Distinguished member of Technical Staff. From 1996-2002
he worked at AT&T Labs as a Technology Leader. In Aug. of 2003 he join
the ECE faculty, University of IL, UIUC.
Dr. Allen is interested in on
cochlear modeling, noninvasive diagnostic testing of cochlear function
(such as DPOAE and power reflectance measurements in the ear canal),
auditory psychophysics, speech processing for hearing aid applications
(noise reduction and multiband compression), speech and music coding
(bit-rate reduction) and speech perception (models of loudness and
masking). He is presently working on the problem of human speech
recognition, with the goal of improving automatic speech recognition
robustness in the presences of noise and filtering.