WS97

Acoustic

Processing

Group

APTransparent.gif (23528 bytes)

In the 1997 JHU/CLSP workshop (WS97) our group revisits the acoustic processor architecture employed in the  state of the art, large vocabulary, continuous speech recognition  systems. We investigate data driven processing paradigms exploring techniques at different context scales. At the short time scales (~10ms) we investigate the non-linear frequency mapping known as Mel-scale. At the medium time scales, (context ~ 100ms) we investigate linear discriminant and heteorscedastic discriminant transforms. At time scales with longer context (~1000ms) we explore  feature-trajectory filtering. At even longer time scales (~ 500 ms to 4s) we experiment with adaptive Cepstrum bias normalization techniques.

The results of our investigation are very encouraging and are summarized in the online final reports and papers.


Group Members

Reports and Papers

  1. Final report and summary of results on SWITCHBOARD, A.G. Andreou ( pdf ).
  2. Learning the Mel-scale and optimal VTL mapping, T. Kamm, H. Hermansky and A.G. Andreou ( pdf ).
  3. Cepstrum bias adaptation for the SWITCHBOARD database in unsupervised mode, Y. Minami ( pdf ).
  4. Processing of modulation spectrum of speech for ASR of conversational speech, H. Hermansky, -available from author-.
  5. WS97 activity report, C. Wellekens ( pdf ).
  6. Enhanced ASR scores by acoustic feature filtering,  C. Wellekens and H. Hermansky -DRAFT paper- ( postscript ).
  7. On generalization of linear discriminant analysis JHU/ECE Technical Report 96-07, April 1996, K. Nagendra and A.G. Andreou ( pdf ).
  8. Heteroscedastic discriminant analysis and reduced rank HMM's for improved speech recognition, K. Nagendra and A.G. Andreou, Speech Communication, Vol. 26, pp. 283-297, December 1998 (pdf).

Presentations


Please send feedback to Andreas G. Andreou . This page was last modified on 11/28/00 07:45 PM