Jaejin Cho “Language Model Integration Based on Memory Control for Sequence-to-sequence Speech Recognition”

March 1, 2019 @ 12:00 pm – 1:15 pm
Hackerman Hall B17
3400 N Charles St
Baltimore, MD 21218
In this presentation, I will talk about a new scheme to train a seq2seq ASR model integrating a pre-trained LM. The proposed fusion method focuses on updating the memory cell/hidden state of LSTM in the seq2seq decoder using the pre-trained LM information. This means the memory retained by the main seq2seq is adjusted by the external LM. Experimental results show the effectiveness of the proposed methods in a mono-lingual ASR setup on the Librispeech corpus and in a transfer learning setup from a multilingual ASR (MLASR) base model to a low-resourced language. In Librispeech, our best model improved WER by 3.7%, 2.4% for test clean, test other relatively to the shallow fusion baseline. In transfer learning from an MLASR base model to the IARPA Babel Swahili model, the best scheme improved the transferred model on eval set by 9.9%, 9.8% in CER, WER relatively to the 2-stage transfer baseline.

Center for Language and Speech Processing