Kenneth Heafield (University of Edinburgh) “Faster Neural Machine Translation”

May 6, 2019 @ 12:00 pm – 1:15 pm
Hackerman Hall 320
3400 N. Charles Street
MD 21218


The Marian toolkit dominated a shared task on translation speed run by the Workshop on Neural Machine translation. Speed came from many levels: model complexity, teacher-student compression, and efficient kernels.  Compressing the model is particularly important because memory bandwidth is the limiting factor on GPUs with tensor cores and on CPUs.  I wrote 8-bit integer multiplication in AVX512 intrinstics, which reduced translation latency 2.7x and now we are looking at 4 bits.  Much of the systems for ML addresses vision tasks; large parameter skew and variable-size input make sequential models difficult and interesting.


Kenneth Heafield is a Lecturer (which translates to en-US as Assistant Professor) leading a machine translation group at the University of Edinburgh. He works on efficient neural networks, low-resource translation, mining petabytes for translations, and occasionally grammatical error correction.

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing