In the 1997 JHU/CLSP workshop (WS97) our group revisits the acoustic processor architecture employed in the state of the art, large vocabulary, continuous speech recognition systems. We investigate data driven processing paradigms exploring techniques at different context scales. At the short time scales (~10ms) we investigate the non-linear frequency mapping known as Mel-scale. At the medium time scales, (context ~ 100ms) we investigate linear discriminant and heteorscedastic discriminant transforms. At time scales with longer context (~1000ms) we explore feature-trajectory filtering. At even longer time scales (~ 500 ms to 4s) we experiment with adaptive Cepstrum bias normalization techniques.
The results of our investigation are very encouraging and are summarized in the online final reports and papers.
|Yasuhiro Minami||NTT Human Interface Labs|