Colin Cherry (National Research Council Canada) “Sampling to Efficiently Train Bilingual Neural Network Language Models”
The neural network joint model of translation (NNJM) is a language model that considers both source and target context to produce a powerful feature for statistical machine translation.However, its softmax top layer necessitates a sum over the entire output vocabulary, which results in very slow maximum likelihood (MLE) training. This has led some groups to train using Noise Contrastive Estimation (NCE), which side steps this sum by optimizing an alternate objective, aiming to differentiate true data points from sampled noise. We carry out the first direct comparison of MLE and NCE training objectives for the NNJM, showing that NCE is significantly outperformed by MLE on large-scale Arabic-English and Chinese-English translation tasks. We also show that this drop can be avoided by using a simple, translation-specific noise distribution that conditions on the source sentence.
Colin Cherry is a Senior Research Officer at the National Research Council of Canada. Previously, he was a Researcher at Microsoft Research. He received his Ph.D. in Computing Science from the University of Alberta. His primary research area is machine translation, but he has also been known to venture into parsing, morphology and information extraction. He is currently secretary of the NAACL, and recently sat on the editorial board of Computational Linguistics.