BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-22412@www.clsp.jhu.edu DTSTAMP:20240329T122151Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nDriven by the goal of eradicating language barriers o n a global scale\, machine translation has solidified itself as a key focu s of artificial intelligence research today. However\, such efforts have c oalesced around a small subset of languages\, leaving behind the vast majo rity of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe\, high-quality results\, all while ke eping ethical considerations in mind? In this talk\, I introduce No Langua ge Left Behind\, an initiative to break language barriers for low-resource languages. In No Language Left Behind\, we took on the low-resource langu age translation challenge by first contextualizing the need for translatio n support through exploratory interviews with native speakers. Then\, we c reated datasets and models aimed at narrowing the performance gap between low and high-resource languages. We proposed multiple architectural and tr aining improvements to counteract overfitting while training on thousands of tasks. Critically\, we evaluated the performance of over 40\,000 differ ent translation directions using a human-translated benchmark\, Flores-200 \, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achiev es an improvement of 44% BLEU relative to the previous state-of-the-art\, laying important groundwork towards realizing a universal translation syst em in an open-source manner.\nBiography\nAngela is a research scientist at Meta AI Research in New York\, focusing on supporting efforts in speech a nd language research. Recent projects include No Language Left Behind (htt ps://ai.facebook.com/research/no-language-left-behind/) and Universal Spee ch Translation for Unwritten Languages (https://ai.facebook.com/blog/ai-tr anslation-hokkien/). Before translation\, Angela previously focused on res earch in on-device models for NLP and computer vision and text generation. DTSTART;TZID=America/New_York:20221118T120000 DTEND;TZID=America/New_York:20221118T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Angela Fan (Meta AI Research) “No Language Left Behind: Scaling Hu man-Centered Machine Translation” URL:https://www.clsp.jhu.edu/events/angela-fan-facebook/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nDriven by the goal of eradicating language barriers o n a global scale\, machine translation has solidified itself as a key focu s of artificial intelligence research today. However\, such efforts have c oalesced around a small subset of languages\, leaving behind the vast majo rity of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe\, high-quality results\, all while ke eping ethical considerations in mind? In this talk\, I introduce No Langua ge Left Behind\, an initiative to break language barriers for low-resource languages. In No Language Left Behind\, we took on the low-resource langu age translation challenge by first contextualizing the need for translatio n support through exploratory interviews with native speakers. Then\, we c reated datasets and models aimed at narrowing the performance gap between low and high-resource languages. We proposed multiple architectural and tr aining improvements to counteract overfitting while training on thousands of tasks. Critically\, we evaluated the performance of over 40\,000 differ ent translation directions using a human-translated benchmark\, Flores-200 \, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achiev es an improvement of 44% BLEU relative to the previous state-of-the-art\, laying important groundwork towards realizing a universal translation syst em in an open-source manner.
\nBiography
\nAngela is a research scientist at Meta AI Research in Ne w York\, focusing on supporting efforts in speech and language research. R ecent projects include No Language Left Behind (https://ai.facebook.com/research/no-language-left-be hind/) and Universal Speech Translation for Unwritten Languages (https://ai.facebook.com/blog/ai-translation -hokkien/). Before translation\, Angela previously focused on research in on-device models for NLP and computer vision and text generation.
\n\n X-TAGS;LANGUAGE=en-US:2022\,Fan\,November END:VEVENT BEGIN:VEVENT UID:ai1ec-23308@www.clsp.jhu.edu DTSTAMP:20240329T122151Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nBiases in datasets\, or unintentionally introduced sp urious cues\, are a common source of misspecification in machine learning. Performant models trained on such data can gender stereotype or be brittl e under distribution shift. In this talk\, we present several results in multimodal and question answering applications studying sources of dataset bias\, and several mitigation methods. We propose approaches where known dimensions of dataset bias are explicitly factored out of a model during learning\, without needing to modify data. Finally\, we ask whether datase t biases can be attributable to annotator behavior during annotation. Draw ing inspiration from work in psychology on cognitive biases\, we show cert ain behavioral patterns are highly indicative of the creation of problemat ic (but valid) data instances in question answering. We give evidence that many existing observations around how dataset bias propagates to models c an be attributed to data samples created by annotators we identify.\nBiogr aphy\nMark Yatskar is an Assistant Professor at University of Pennsylvania in the department of Computer and Information Science. He did his PhD at University of Washington co-advised by Luke Zettlemoyer and Ali Farhadi. H e was a Young Investigator at the Allen Institute for Artificial Intellige nce for several years working with their computer vision team\, Prior. His work spans Natural Language Processing\, Computer Vision\, and Fairness i n Machine Learning. He received a Best Paper Award at EMNLP for work on ge nder bias amplification\, and his work has been featured in Wired and the New York Times. DTSTART;TZID=America/New_York:20230210T120000 DTEND;TZID=America/New_York:20230210T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Mark Yatskar (University of Pennsylvania) “Understanding Dataset Bi ases: Behavioral Indicators During Annotation and Contrastive Mitigations” URL:https://www.clsp.jhu.edu/events/mark-yatskar-university-of-pennsylvania / X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nBiases in datasets\, or unintentionally introduced sp urious cues\, are a common source of misspecification in machine learning. Performant models trained on such data can gender stereotype or be brittl e under distribution shift. In this talk\, we present several results in multimodal and question answering applications studying sources of dataset bias\, and several mitigation methods. We propose approaches where known dimensions of dataset bias are explicitly factored out of a model during learning\, without needing to modify data. Finally\, we ask whether datase t biases can be attributable to annotator behavior during annotation. Draw ing inspiration from work in psychology on cognitive biases\, we show cert ain behavioral patterns are highly indicative of the creation of problemat ic (but valid) data instances in question answering. We give evidence that many existing observations around how dataset bias propagates to models c an be attributed to data samples created by annotators we identify.
\n< p>Biography\nMark Yatskar is an Assistan t Professor at University of Pennsylvania in the department of Computer an d Information Science. He did his PhD at University of Washington co-advis ed by Luke Zettlemoyer and Ali Farhadi. He was a Young Investigator at the Allen Institute for Artificial Intelligence for several years working wit h their computer vision team\, Prior. His work spans Natural Language Proc essing\, Computer Vision\, and Fairness in Machine Learning. He received a Best Paper Award at EMNLP for work on gender bias amplification\, and his work has been featured in Wired and the New York Times.
\n\n X-TAGS;LANGUAGE=en-US:2023\,February\,Yatskar END:VEVENT END:VCALENDAR