Cross-lingual Transfer for Machine Translation – Nate Robinson (JHU)
Abstract
Historically in natural language processing (NLP) research, many attempts have been made to answer a simple question: what does the relatedness between languages imply about their utility for one another in cross-lingual transfer applications? However, despite these attempts, there is still no academic consensus on the answer to this question. Different studies show stronger or weaker relationships, which are at times contradictory. Because of these failed attempts to show convincingly meaningful trends, some researchers adopt the view that language relatedness has no relationship with cross-lingual transfer utility, despite linguistic theory. In this paper we present what is, to our knowledge, the most comprehensive exploration of interlingual relationships for machine translation to date, covering thousands of pairs of over 300 languages. Our results indicate that language relatedness often correlates moderately with transfer effectiveness, that larger train sets and tighter language claves show stronger such correlations, and that language relations matter less when transfer is not zero-shot.
Bio
Nate Robinson is a third-year PhD student at Johns Hopkins University’s Center for Language and Speech Processing. He is advised by Kenton Murray and Sanjeev Khudanpur. He completed his Masters in Language Technologies at Carnegie Mellon University under the advisement of David Mortensen, and he researched at Brigham Young University’s DRAGN Labs under Nancy Fulda prior to that. Nate’s research focuses on building language technologies, including machine translation and speech technologies, for low-resource and related languages. He speaks English, French, Spanish, Arabic, and Haitian proficiently through years of study. He is a nerd about linguistics and languages, as well as math, books, music, natural sciences, and world cultures.
Also Available by Zoom: https://wse.zoom.us/j/96735183473