Jaejin Cho “Language Model Integration Based on Memory Control for Sequence-to-sequence Speech Recognition”

When:
March 1, 2019 @ 12:00 pm – 1:15 pm
2019-03-01T12:00:00-05:00
2019-03-01T13:15:00-05:00
Where:
Hackerman Hall B17
3400 N Charles St
Baltimore, MD 21218
USA
Abstract: 
In this presentation, I will talk about a new scheme to train a seq2seq ASR model integrating a pre-trained LM. The proposed fusion method focuses on updating the memory cell/hidden state of LSTM in the seq2seq decoder using the pre-trained LM information. This means the memory retained by the main seq2seq is adjusted by the external LM. Experimental results show the effectiveness of the proposed methods in a mono-lingual ASR setup on the Librispeech corpus and in a transfer learning setup from a multilingual ASR (MLASR) base model to a low-resourced language. In Librispeech, our best model improved WER by 3.7%, 2.4% for test clean, test other relatively to the shallow fusion baseline. In transfer learning from an MLASR base model to the IARPA Babel Swahili model, the best scheme improved the transferred model on eval set by 9.9%, 9.8% in CER, WER relatively to the 2-stage transfer baseline.

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing