Brian Thompson “Vecalign: Improved Sentence Alignment in Linear Time and Space”

When:
October 21, 2019 @ 12:00 pm – 1:00 pm
2019-10-21T12:00:00-04:00
2019-10-21T13:00:00-04:00
Abstract: We introduce Vecalign, a novel bilingual sentence alignment method which is linear in time and space with respect to the number of sentences being aligned and which requires only bilingual sentence embeddings. On a standard German–French test set, Vecalign outperforms the previous state-of-the-art method (which has quadratic time complexity and requires a machine translation system) by 5 F1 points. It substantially outperforms the popular Hunalign toolkit at recovering Bible verse alignments in medium- to low-resource language pairs, and it improves downstream MT quality by 1.7 and 1.6 BLEU in Sinhala-English and Nepali-English, respectively, compared to the Hunalign-based Paracrawl pipeline.
 
Date and time: Monday, Oct 21st, 2019 at 12:00PM
Venue: Hackerman B17

Center for Language and Speech Processing