Acoustic Processing/Modeling Group: Exploring the Time Dimension at Different Scales

In the 1997 JHU/CLSP workshop (WS97) our group revisits the acoustic processor architecture employed in the state of the art, large vocabulary, continuous speech recognition systems. We investigate data driven processing paradigms exploring techniques at different context scales. At the short time scales (~10ms) we investigate the non-linear frequency mapping known as Mel-scale. At the medium time scales, (context ~ 100ms) we investigate linear discriminant and heteorscedastic discriminant transforms. At time scales with longer context (~1000ms) we explore feature-trajectory filtering. At even longer time scales (~ 500 ms to 4s) we experiment with adaptive Cepstrum bias normalization techniques.

The results of our investigation are very encouraging and are summarized in the online final reports and papers.

 

Team Members 
Senior Members
Andreas AndreouCLSP
Hynek HermanskyCLSP
Juergen LuettinIDIAP
Yasuhiro MinamiNTT Human Interface Labs
Christian WellekensEurecom
Graduate Students
Terri KammCLSP
Daniel FainCalTech

Center for Language and Speech Processing