BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-20115@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nData science in small medical datasets usually means doing precision guesswork on unreliable data provided by those with high e xpectations. The first part of this talk will focus on issues that data sc ientists and engineers have to address when working with this kind of data (e.g. unreliable labels\, the effect of confounding factors\, necessity o f clinical interpretability\, difficulties with fusing more data sets). Th e second part of the talk will include some real examples of this kind of data science in the field of neurology (prediction of motor deficits in Pa rkinson’s disease based on acoustic analysis of speech\, diagnosis of Park inson’s disease dysgraphia utilising online handwriting\, exploring the Mo zart effect in epilepsy based on the music information retrieval) and psyc hology (assessment of graphomotor disabilities in children with developmen tal dysgraphia).\nBiography\nJiri Mekyska is the head of the BDALab (Brain Diseases Analysis Laboratory) at the Brno University of Technology\, wher e he leads a multidisciplinary team of researchers (signal processing engi neers\, data scientists\, neurologists\, psychologists) with a special foc us on the development of new digital endpoints and digital biomarkers enab ling to better understand\, diagnose and monitor neurodegenerative (e.g. P arkinson’s disease) and neurodevelopmental (e.g. dysgraphia) diseases. DTSTART;TZID=America/New_York:20210329T120000 DTEND;TZID=America/New_York:20210329T131500 LOCATION:via Zoom SEQUENCE:0 SUMMARY:Jiri Mekyska (Brno University of Technology) “Data Science in Small Medical Data Sets: From Logistic Regression Towards Logistic Regression” URL:https://www.clsp.jhu.edu/events/jiri-mekyska-brno-university-of-technol ogy/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nData science in small medical datasets usually means doing precision guesswork on unreliable data provided by those with high e xpectations. The first part of this talk will focus on issues that data sc ientists and engineers have to address when working with this kind of data (e.g. unreliable labels\, the effect of confounding factors\, necessity o f clinical interpretability\, difficulties with fusing more data sets). Th e second part of the talk will include some real examples of this kind of data science in the field of neurology (prediction of motor deficits in Pa rkinson’s disease based on acoustic analysis of speech\, diagnosis of Park inson’s disease dysgraphia utilising online handwriting\, exploring the Mo zart effect in epilepsy based on the music information retrieval) and psyc hology (assessment of graphomotor disabilities in children with developmen tal dysgraphia).
\nBiography
\nAbstr act
\nNeural sequence generation systems oftentimes generat e sequences by searching for the most likely sequence under the learnt pro bability distribution. This assumes that the most likely sequence\, i.e. t he mode\, under such a model must also be the best sequence it has to offe r (often in a given context\, e.g. conditioned on a source sentence in tra nslation). Recent findings in neural machine translation (NMT) show that t he true most likely sequence oftentimes is empty under many state-of-the-a rt NMT models. This follows a large list of other pathologies and biases o bserved in NMT and other sequence generation models: a length bias\, large r beams degrading performance\, exposure bias\, and many more. Many of the se works blame the probabilistic formulation of NMT or maximum likelihood estimation. We provide a different view on this: it is mode-seeking search \, e.g. beam search\, that introduces many of these pathologies and biases \, and such a decision rule is not suitable for the type of distributions learnt by NMT systems. We show that NMT models spread probability mass ove r many translations\, and that the most likely translation oftentimes is a rare event. We further show that translation distributions do capture imp ortant aspects of translation well in expectation. Therefore\, we advocate for decision rules that take into account the entire probability distribu tion and not just its mode. We provide one example of such a decision rule \, and show that this is a fruitful research direction.
\nBi ography
\nI am an assistant professor (UD) in natu ral language processing at the Institute for Logic\, Language and Computation where I lead the Probabilistic Language Learning group.
\nMy work concerns the design of models and algorithms that learn to represe nt\, understand\, and generate language data. Examples of specific problem s I am interested in include language modelling\, machine translation\, sy ntactic parsing\, textual entailment\, text classification\, and question answering.
\nI also develop techniques to approach general machine l earning problems such as probabilistic inference\, gradient and density es timation.
\nMy interests sit at the intersection of disciplines such as statistics\, machine learning\, approximate inference\, global optimiz ation\, formal languages\, and computational linguistics.
\n\n< p> \n X-TAGS;LANGUAGE=en-US:2021\,April\,Aziz END:VEVENT BEGIN:VEVENT UID:ai1ec-20120@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nRobotics@Google’s mission is to make robots useful in the real world through machine learning. We are excited about a new model for robotics\, designed for generalization across diverse environments an d instructions. This model is focused on scalable data-driven learning\, w hich is task-agnostic\, leverages simulation\, learns from past experience \, and can be quickly adapted to work in the real-world through limited in teractions. In this talk\, we’ll share some of our recent work in this dir ection in both manipulation and locomotion applications.\nBiography\nCarol ina Parada is a Senior Engineering Manager at Google Robotics. She leads t he robot-mobility group\, which focuses on improving robot motion planning \, navigation\, and locomotion\, using reinforcement learning. Prior to th at\, she led the camera perception team for self-driving cars at Nvidia fo r 2 years. She was also a lead with Speech @ Google for 7 years\, where sh e drove multiple research and engineering efforts that enabled Ok Google\, the Google Assistant\, and Voice-Search. Carolina grew up in Venezuela an d moved to the US to pursue a B.S. and M.S. degree in Electrical Engineeri ng at University of Washington and her Phd at Johns Hopkins University at the Center for Language and Speech Processing (CLSP). DTSTART;TZID=America/New_York:20210423T120000 DTEND;TZID=America/New_York:20210423T131500 LOCATION:via Zoom SEQUENCE:0 SUMMARY:Carolina Parada (Google AI) “State of Robotics @ Google” URL:https://www.clsp.jhu.edu/events/carolina-parada-google-ai/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n
Abstr act
\nRobotics@Google’s mission is to make robots useful i n the real world through machine learning. We are excited about a new mode l for robotics\, designed for generalization across diverse environments a nd instructions. This model is focused on scalable data-driven learning\, which is task-agnostic\, leverages simulation\, learns from past experienc e\, and can be quickly adapted to work in the real-world through limited i nteractions. In this talk\, we’ll share some of our recent work in this di rection in both manipulation and locomotion applications.
\n< strong>Biography
\nCarolina Parad a is a Senior Engineering Manager at Google Robotics. She leads the robot-mobility group\, which focuses on improving robot motion planning\, navigation\, and locomotion\, using reinforcement learning. Prior to that \, she led the camera perception team for self-driving cars at Nvidia for 2 years. She was also a lead with Speech @ Google for 7 years\, where she drove multiple research and engineering efforts that enabled Ok Google\, t he Google Assistant\, and Voice-Search. Carolina< /span> grew up in Venezuela and moved to the US to pursue a B.S. and M.S. degree in Electrical Engineering at University of Washington and her Phd a t Johns Hopkins University at the Center for Language and Speech Processin g (CLSP).
\n X-TAGS;LANGUAGE=en-US:2021\,April\,Parada END:VEVENT BEGIN:VEVENT UID:ai1ec-20716@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nOver the last few years\, deep neural models have tak en over the field of natural language processing (NLP)\, brandishing great improvements on many of its sequence-level tasks. But the end-to-end natu re of these models makes it hard to figure out whether the way they repres ent individual words aligns with how language builds itself from the botto m up\, or how lexical changes in register and domain can affect the untest ed aspects of such representations.\nIn this talk\, I will present NYTWIT\ , a dataset created to challenge large language models at the lexical leve l\, tasking them with identification of processes leading to the formation of novel English words\, as well as with segmentation and recovery of the specific subclass of novel blends. I will then present XRayEmb\, a method which alleviates the hardships of processing these novelties by fitting a character-level encoder to the existing models’ subword tokenizers\; and conclude with a discussion of the drawbacks of current tokenizers’ vocabul ary creation schemes.\nBiography\nYuval Pinter is a Senior Lecturer in the Department of Computer Science at Ben-Gurion University of the Negev\, fo cusing on natural language processing. Yuval got his PhD at the Georgia In stitute of Technology School of Interactive Computing as a Bloomberg Data Science PhD Fellow. Before that\, he worked as a Research Engineer at Yaho o Labs and as a Computational Linguist at Ginger Software\, and obtained a n MA in Linguistics and a BSc in CS and Mathematics\, both from Tel Aviv U niversity. Yuval blogs (in Hebrew) about language matters on Dagesh Kal. DTSTART;TZID=America/New_York:20210910T120000 DTEND;TZID=America/New_York:20210910T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD SEQUENCE:0 SUMMARY:Yuval Pinter (Ben-Gurion University – Virtual Visit) “Challenging a nd Adapting NLP Models to Lexical Phenomena” URL:https://www.clsp.jhu.edu/events/yuval-pinter/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nOver the last few years\, deep neural models have tak en over the field of natural language processing (NLP)\, brandishing great improvements on many of its sequence-level tasks. But the end-to-end natu re of these models makes it hard to figure out whether the way they repres ent individual words aligns with how language builds itself from the botto m up\, or how lexical changes in register and domain can affect the untest ed aspects of such representations.
\nIn this talk\, I will present NYTWIT\, a dataset created to challenge large language models at the lexic al level\, tasking them with identification of processes leading to the fo rmation of novel English words\, as well as with segmentation and recovery of the specific subclass of novel blends. I will then present XRayEmb\, a method which alleviates the hardships of processing these novelties by fi tting a character-level encoder to the existing models’ subword tokenizers \; and conclude with a discussion of the drawbacks of current tokenizers’ vocabulary creation schemes.
\nBiography
\nYuval Pinter
is a Senior Lecturer in the Department of Computer Science at Ben-Gurion
University of the Negev\, focusing on natural language processing. Yuval got his PhD at the Georgia Institute of Tec
hnology School of Interactive Computing as a Bloomberg Data Science PhD Fe
llow. Before that\, he worked as a Research Engineer at Yahoo Labs and as
a Computational Linguist at Ginger Software\, and obtained an MA in Lingui
stics and a BSc in CS and Mathematics\, both from Tel Aviv University.
Abstr act
\nText simplification aims to help audiences read and u nderstand a piece of text through lexical\, syntactic\, and discourse modi fications\, while remaining faithful to its central idea and meaning. Than ks to large-scale parallel corpora derived from Wikipedia and News\, much of modern-day text simplification research focuses on sentence simplificat ion\, transforming original\, more complex sentences into simplified versi ons. In this talk\, I present new frontiers that focus on discourse operat ions. First\, we consider the challenging task of simplifying highly techn ical language\, in our case\, medical texts. We introduce a new corpus of parallel texts in English comprising technical and lay summaries of all pu blished evidence pertaining to different clinical topics. We then propose a new metric to quantify stylistic differentiates between the two\, and mo dels for paragraph-level simplification. Second\, we present the first dat a-driven study of inserting elaborations and explanations during simplific ation\, and illustrate the richness and complexities of this phenomenon. p>\n
Biography
\nAbstr act
\nRaytheon BBN participated in the IARPA MATERIAL progr am\, whose objective is to enable rapid development of language-independen t methods for cross-lingual information retrieval (CLIR). The challenging CLIR task of retrieving documents written (or spoken) in one language so t hat they satisfy an information need expressed in a different language is exacerbated by unique challenges posed by the MATERIAL program: limited tr aining data for automatic speech recognition and machine translation\, sca nt lexical resources\, non-standardized orthography\, etc. Furthermore\, t he format of the queries and the “Query-Weighted Value” performance measur e are non-standard and not previously studied in the IR community. In this talk\, we will describe the Raytheon BBN CLIR system\, which was successf ul at addressing the above challenges and unique characteristics of the pr ogram.
\nBiography
\nDamianos Karakos has been at Raytheon BBN for the past nine years\, wh ere he is currently a Senior Principal Engineer\, Research. Before that\, he was research faculty at Johns Hopkins University. He has worked on seve ral Government projects (e.g.\, DARPA GALE\, DARPA RATS\, IARPA BABEL\, IA RPA MATERIAL\, IARPA BETTER) and on a variety of HLT-related topics (e.g.\ , speech recognition\, speech activity detection\, keyword search\, inform ation retrieval). He has published more than 60 peer-reviewed papers. His research interests lie at the intersection of human language technology an d machine learning\, with an emphasis on statistical methods. He obtained a PhD in Electrical Engineering from the University of Maryland\, College Park\, in 2002.
\n\n
Abstr act
\nWhile there is a vast amount of text written about ne arly any topic\, this is often difficult for someone unfamiliar with a spe cific field to understand. Automated text simplification aims to reduce th e complexity of a document\, making it more comprehensible to a broader au dience. Much of the research in this field has traditionally focused on si mplification sub-tasks\, such as lexical\, syntactic\, or sentence-level s implification. However\, current systems struggle to consistently produce high-quality simplifications. Phrase-based models tend to make too many po or transformations\; on the other hand\, recent neural models\, while prod ucing grammatical output\, often do not make all needed changes to the ori ginal text. In this thesis\, I discuss novel approaches for improving lexi cal and sentence-level simplification systems. Regarding sentence simplifi cation models\, after noting that encouraging diversity at inference time leads to significant improvements\, I take a closer look at the idea of di versity and perform an exhaustive comparison of diverse decoding technique s on other generation tasks. I also discuss the limitations in the framing of current simplification tasks\, which prevent these models from yet bei ng practically useful. Thus\, I also propose a retrieval-based reformulati on of the problem. Specifically\, starting with a document\, I identify co ncepts critical to understanding its content\, and then retrieve documents relevant for each concept\, re-ranking them based on the desired complexi ty level.
\nBiography
\nI ’m a research scientist at the HLTCOE at Johns Hopkins University. My prim ary research interests are in language generation\, diverse and constraine d decoding\, and information retrieval. During my PhD I focused mainly on the task of text simplification\, and now am working on formulating struct ured prediction problems as end-to-end generation tasks. I received my PhD in July 2021 from the University of Pennsylvania with Chris Callison-Burc h and Marianna Apidianaki.
\n\n X-TAGS;LANGUAGE=en-US:2021\,Kriz\,October END:VEVENT BEGIN:VEVENT UID:ai1ec-20988@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20211025T120000 DTEND;TZID=America/New_York:20211025T131500 LOCATION:Maryland Hall 110 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:CLSP Student Seminar URL:https://www.clsp.jhu.edu/events/clsp-student-seminar-3/ X-COST-TYPE:free END:VEVENT BEGIN:VEVENT UID:ai1ec-21023@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nSpeech data is notoriously difficult to work with due to a variety of codecs\, lengths of recordings\, and meta-data formats. W e present Lhotse\, a speech data representation library that draws upon le ssons learned from Kaldi speech recognition toolkit and brings its concept s into the modern deep learning ecosystem. Lhotse provides a common JSON d escription format with corresponding Python classes and data preparation r ecipes for over 30 popular speech corpora. Various datasets can be easily combined together and re-purposed for different tasks. The library handles multi-channel recordings\, long recordings\, local and cloud storage\, la zy and on-the-fly operations amongst other features. We introduce Cut and CutSet concepts\, which simplify common data wrangling tasks for audio and help incorporate acoustic context of speech utterances. Finally\, we show how Lhotse leverages PyTorch data API abstractions and adopts them to han dle speech data for deep learning.\nBiography\nPiotr Zelasko is an assista nt research scientist in the Center for Language and Speech Processing (CL SP) who specializes in automatic speech recognition (ASR) and spoken langu age understanding (SLU). His current research focuses on applying multilin gual and crosslingual speech recognition systems to categorize the phoneti c inventory of a previously unknown language and on improving defenses aga inst adversarial attacks on both speaker identification and automatic spee ch recognition systems. He is also addressing the question of how to struc ture a spontaneous conversation into high-level semantic units such as dia log acts or topics. Finally\, he is working on Lhotse + K2\, the next-gene ration speech processing research software ecosystem. Before joining Johns Hopkins\, Zelasko worked as a machine learning consultant for Avaya (2017 -2019)\, and as a machine learning engineer for Techmo (2015-2017). Zelask o received his PhD (2019) in electronics engineering\, as well as his mast er’s (2014) and undergraduate degrees (2013) in acoustic engineering from AGH University of Science and Technology in Kraków\, Poland. DTSTART;TZID=America/New_York:20211029T120000 DTEND;TZID=America/New_York:20211029T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore MD 21218 SEQUENCE:0 SUMMARY:Piotr Zelasko (CLSP at JHU) “Lhotse: a speech data representation l ibrary for the modern deep learning ecosystem” URL:https://www.clsp.jhu.edu/events/piotr-zelasko-clsp-at-jhu-lhotse-a-spee ch-data-representation-library-for-the-modern-deep-learning-ecosystem/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nSpeech data is notoriously difficult t o work with due to a variety of codecs\, lengths of recordings\, and meta- data formats. We present Lhotse\, a speech data representation library tha t draws upon lessons learned from Kaldi speech recognition toolkit and bri ngs its concepts into the modern deep learning ecosystem. Lhotse provides a common JSON description format with corresponding Python classes and dat a preparation recipes for over 30 popular speech corpora. Various datasets can be easily combined together and re-purposed for different tasks. The library handles multi-channel recordings\, long recordings\, local and clo ud storage\, lazy and on-the-fly operations amongst other features. We int roduce Cut and CutSet concepts\, which simplify common data wrangling task s for audio and help incorporate acoustic context of speech utterances. Fi nally\, we show how Lhotse leverages PyTorch data API abstractions and ado pts them to handle speech data for deep learning.
\nB iography
\nPiotr Zelasko is an assistant research scientist in the Center for Language and Speech Processing (CLSP) who specializes i n automatic speech recognition (ASR) and spoken language understanding (SL U). His current research focuses on applying multilingual and crosslingual speech recognition systems to categorize the phonetic inventory of a prev iously unknown language and on improving defenses against adversarial atta cks on both speaker identification and automatic speech recognition system s. He is also addressing the question of how to structure a spontaneous co nversation into high-level semantic units such as dialog acts or topics. F inally\, he is working on Lhotse + K2\, the next-generation speech process ing research software ecosystem. Before joining Johns Hopkins\, Zelasko wo rked as a machine learning consultant for Avaya (2017-2019)\, and as a mac hine learning engineer for Techmo (2015-2017). Zelasko received his PhD (2 019) in electronics engineering\, as well as his master’s (2014) and under graduate degrees (2013) in acoustic engineering from AGH University of Sci ence and Technology in Kraków\, Poland.
\n X-TAGS;LANGUAGE=en-US:2021\,October\,Zelasko END:VEVENT BEGIN:VEVENT UID:ai1ec-21026@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20211101T123000 DTEND;TZID=America/New_York:20211101T131500 LOCATION:Maryland Hall 110 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:CLSP Student Seminar URL:https://www.clsp.jhu.edu/events/clsp-student-seminar-4/ X-COST-TYPE:free END:VEVENT BEGIN:VEVENT UID:ai1ec-21031@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nMost people take for granted that when they speak\, t hey will be heard and understood. But for the millions who live with speec h impairments caused by physical or neurological conditions\, trying to co mmunicate with others can be difficult and lead to frustration. While ther e have been a great number of recent advances in Automatic Speech Recognit ion (ASR) technologies\, these interfaces can be inaccessible for those wi th speech impairments.\nIn this talk\, we will present Parrotron\, an end- to-end-trained speech-to-speech conversion model that maps an input spectr ogram directly to another spectrogram\, without utilizing any intermediate discrete representation. The system is also trained to emit words in addi tion to a spectrogram\, in parallel. We demonstrate that this model can be trained to normalize speech from any speaker regardless of accent\, pro sody\, and background noise\, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody. We fu rther show that this normalization model can be adapted to normalize highl y atypical speech from speakers with a variety of speech impairments (due to\, ALS\, Cerebral-Palsy\, Deafness\, Stroke\, Brain Injury\, etc.) \, r esulting in significant improvements in intelligibility and naturalness\, measured via a speech recognizer and listening tests. Finally\, demonstrat ing the utility of this model on other speech tasks\, we show that the sam e model architecture can be trained to perform a speech separation task.\n Dimitri will give a brief description of some key moments in development o f speech recognition algorithms that he was involved in and their applicat ions to YouTube closed captions\, Live Transcribe and wearable subtitles. \nFadi will then speak about the development of Parrotron.\nBiographies\nD imitri Kanevsky started his career at Google working on speech recognition algorithms. Prior to joining Google\, Dimitri was a Research staff member in the Speech Algorithms Department at IBM. Prior to IBM\, he worked at a number of centers for higher mathematics\, including Max Planck Institu te in Germany and the Institute for Advanced Studies in Princeton. He curr ently holds 295 US patents and was Master Inventor at IBM. MIT Technology Review recognized Dimitri conversational biometrics based security patent as one of five most influential patents for 2003. In 2012 Dimitri was hono red at the White House as a Champion of Change for his efforts to advance access to science\, technology\, engineering\, and math.\nFadi Biadsy is a senior staff research scientist at Google NY for the past ten years. He h as been exploring and leading multiple projects at Google\, including spee ch recognition\, speech conversion\, language modeling\, and semantic unde rstanding. He received his PhD from Columbia University in 2011. At Colum bia\, he researched a variety of speech and language processing projects i ncluding\, dialect and accent recognition\, speech recognition\, charismat ic speech and question answering. He holds a BSc and MSc in mathematics a nd computer science. He worked on handwriting recognition during his maste rs degree and he worked as a senior software developer for five years at D alet digital media systems building multimedia broadcasting systems. DTSTART;TZID=America/New_York:20211105T120000 DTEND;TZID=America/New_York:20211105T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Fadi Biadsy and Dimitri Kanevsky (Google) “Speech Recognition: From Speaker Dependent to Speaker Independent to Full Personalization” “Parrot ron: A Unified E2E Speech-to Speech Conversion and ASR Model for Atypical Speech” URL:https://www.clsp.jhu.edu/events/fadi-biadsy-and-dimitri-kanevsky-google / X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nMost people take for granted that when they speak\, they will be heard and understood. But for the millions who live with speech impairments caused by physical or neurological condi tions\, trying to communicate with others can be difficult and lead to fru stration. While there have been a great number of recent advances in Autom atic Speech Recognition (ASR) technologies\, these interfaces can be inacc essible for those with speech impairments.
\nIn this talk\, we will present Parrotron\, an end-to-end-trained speech-to-sp eech conversion model that maps an input spectrogram directly to another s pectrogram\, without utilizing any intermediate discrete representation. T he system is also trained to emit words in addition to a spectrogram\, in parallel. We demonstrate that this model can be trained to normalize spe ech from any speaker regardless of accent\, prosody\, and background noise \, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody. We further show that this normal ization model can be adapted to normalize highly atypical speech from spea kers with a variety of speech impairments (due to\, ALS\, Cerebral-Palsy\, Deafness\, Stroke\, Brain Injury\, etc.) \, resulting in significant imp rovements in intelligibility and naturalness\, measured via a speech recog nizer and listening tests. Finally\, demonstrating the utility of this mod el on other speech tasks\, we show that the same model architecture can be trained to perform a speech separation task.
\nDimitri will give a brief description of some key moments in development o f speech recognition algorithms that he was involved in and their applicat ions to YouTube closed captions\, Live Transcribe and wearable subtitles.
\nFadi will then speak about the development of Parrotron.
\nBiographies
\nDimitri K anevsky started his career at Google working on speech recognitio n algorithms. Prior to joining Google\, Dimitri was a Research staff membe r in the Speech Algorithms Department at IBM. Prior to IBM\, he worked a t a number of centers for higher mathematics\, including Max Planck Instit ute in Germany and the Institute for Advanced Studies in Princeton. He cur rently holds 295 US patents and was Master Inventor at IBM. MIT Technology Review recognized Dimitri conversational biometrics based security patent as one of five most influential patents for 2003. In 2012 Dimitri was hon ored at the White House as a Champion of Change for his efforts to advance access to science\, technology\, engineering\, and math.
\nFadi Biadsy is a senior staff research scientist at Google NY for the past ten years. He has been exploring and leading multiple projects a t Google\, including speech recognition\, speech conversion\, language mod eling\, and semantic understanding. He received his PhD from Columbia Uni versity in 2011. At Columbia\, he researched a variety of speech and langu age processing projects including\, dialect and accent recognition\, speec h recognition\, charismatic speech and question answering. He holds a BSc and MSc in mathematics and computer science. He worked on handwriting rec ognition during his masters degree and he worked as a senior software deve loper for five years at Dalet digital media systems building multimedia br oadcasting systems.
\n X-TAGS;LANGUAGE=en-US:2021\,Biadsy and Kanevsky\,November END:VEVENT BEGIN:VEVENT UID:ai1ec-21036@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20211108T120000 DTEND;TZID=America/New_York:20211108T131500 LOCATION:Maryland Hall 110 @ 3400 N. Charles Street SEQUENCE:0 SUMMARY:CLSP Student Seminar URL:https://www.clsp.jhu.edu/events/clsp-student-seminar-5/ X-COST-TYPE:free END:VEVENT BEGIN:VEVENT UID:ai1ec-21041@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nNarration is a universal human practice that serves a s a key site of education\, collective memory\, fostering social belief sy stems\, and furthering human creativity. Recent studies in economics (Shil ler\, 2020)\, climate science (Bushell et al.\, 2017)\, political polariza tion (Kubin et al.\, 2021)\, and mental health (Adler et al.\, 2016) sugge st an emerging interdisciplinary consensus that narrative is a central con cept for understanding human behavior and beliefs. For close to half a cen tury\, the field of narratology has developed a rich set of theoretical fr ameworks for understanding narrative. And yet these theories have largely gone untested on large\, heterogenous collections of texts. Scholars conti nue to generate schemas by extrapolating from small numbers of manually ob served documents. In this talk\, I will discuss how we can use machine lea rning to develop data-driven theories of narration to better understand wh at Labov and Waletzky called “the simplest and most fundamental narrative structures.” How can machine learning help us approach what we might call a minimal theory of narrativity?\nBiography\nAndrew Piper is Professor and William Dawson Scholar in the Department of Languages\, Literatures\, and Cultures at McGill University. He is the director of _.txtlab \n_\,\n a l aboratory for cultural analytics\, and editor of the /Journal of Cultural Analytics/\, an open-access journal dedicated to the computational study o f culture. He is the author of numerous books and articles on the relation ship of technology and reading\, including /Book Was There: Reading in Ele ctronic Times/(Chicago 2012)\, /Enumerations: Data and Literary Study/(Chi cago 2018)\, and most recently\, /Can We Be Wrong? The Problem of Textual Evidence in a Time of Data/(Cambridge 2020). DTSTART;TZID=America/New_York:20211112T120000 DTEND;TZID=America/New_York:20211112T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Andrew Piper (McGill University) ” How can we use machine learning to understand narration?” URL:https://www.clsp.jhu.edu/events/andrew-piper-mcgill-university-how-can- we-use-machine-learning-to-understand-narration/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nNarration is a universal human practice that serves a s a key site of education\, collective memory\, fostering social belief sy stems\, and furthering human creativity. Recent studies in economics (Shil ler\, 2020)\, climate science (Bushell et al.\, 2017)\, political polariza tion (Kubin et al.\, 2021)\, and mental health (Adler et al.\, 2016) sugge st an emerging interdisciplinary consensus that narrative is a central con cept for understanding human behavior and beliefs. For close to half a cen tury\, the field of narratology has developed a rich set of theoretical fr ameworks for understanding narrative. And yet these theories have largely gone untested on large\, heterogenous collections of texts. Scholars conti nue to generate schemas by extrapolating from small numbers of manually ob served documents. In this talk\, I will discuss how we can use machine lea rning to develop data-driven theories of narration to better understand wh at Labov and Waletzky called “the simplest and most fundamental narrative structures.” How can machine learning help us approach what we might call a minimal theory of narrativity?
\nBiography
\n< p>Andrew Piper is Professor and William D awson Scholar in the Department of Languages\, Literatures\, and Cultures at McGill University. He is the director of _.txtlab \n\na laboratory for cultural ana lytics\, and editor of the /Journal of Cultural Analytics/\, an open-acces s journal dedicated to the computational study of culture. He is the autho r of numerous books and articles on the relationship of technology and rea ding\, including /Book Was There: Reading in Electronic Times/(Chicago 201 2)\, /Enumerations: Data and Literary Study/(Chicago 2018)\, and most rece ntly\, /Can We Be Wrong? The Problem of Textual Evidence in a Time of Data /(Cambridge 2020).
\n X-TAGS;LANGUAGE=en-US:2021\,November\,Piper END:VEVENT BEGIN:VEVENT UID:ai1ec-21067@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20211115T120000 DTEND;TZID=America/New_York:20211115T131500 LOCATION:Maryland Hall 110 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:CLSP Student Seminar URL:https://www.clsp.jhu.edu/events/clsp-student-seminar-7/ X-COST-TYPE:free END:VEVENT BEGIN:VEVENT UID:ai1ec-21057@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nThis talk will outline the major challenging in porti ng mainstream speech technology to the domain of clinical applications\; i n particular\, the need for personalised systems\, the challenge of workin g in an inherently sparse data domain and developing meaningful collaborat ions with all stakeholders. The talk will give an overview of recent state -of-the-art research from current projects including in the areas of recog nition of disordered speech\, automatic processing of conversations and th e automatic detection and tracking of paralinguistic information at the Un iversity of Sheffield (UK)’s Speech and Hearing (SPandH) & Healthcare lab. \nBiography\nHeidi is a Senior Lecturer (associate professor) in Computer Science at the University of Sheffield\, United Kingdom. Her research inte rests are on the application of AI-based voice technologies to healthcare. In particular\, the detection and monitoring of people’s physical and men tal health including verbal and non-verbal traits for expressions of emoti on\, anxiety\, depression and neurodegenerative conditions in e.g.\, thera peutic or diagnostic settings. DTSTART;TZID=America/New_York:20211119T120000 DTEND;TZID=America/New_York:20211119T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Heidi Christensen (University of Sheffield\, UK) Virtual Seminar “A utomated Processing of Pathological Speech: Recent Work and Ongoing Challe nges” URL:https://www.clsp.jhu.edu/events/heidi-christensen-university-of-sheffie ld-uk-virtual-seminar-automated-processing-of-pathological-speech-recent-w ork-and-ongoing-challenges/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nThis talk will outline the major challenging in porti ng mainstream speech technology to the domain of clinical applications\; i n particular\, the need for personalised systems\, the challenge of workin g in an inherently sparse data domain and developing meaningful collaborat ions with all stakeholders. The talk will give an overview of recent state -of-the-art research from current projects including in the areas of recog nition of disordered speech\, automatic processing of conversations and th e automatic detection and tracking of paralinguistic information at the Un iversity of Sheffield (UK)’s Speech and Hearing (SPandH) & Healthcare lab.
\nBiography
\nHeidi is a Senior Lecturer (as sociate professor) in Computer Science at the University of Sheffield\, Un ited Kingdom. Her research interests are on the application of AI-based vo ice technologies to healthcare. In particular\, the detection and monitori ng of people’s physical and mental health including verbal and non-verbal traits for expressions of emotion\, anxiety\, depression and neurodegenera tive conditions in e.g.\, therapeutic or diagnostic settings.
\n X-TAGS;LANGUAGE=en-US:2021\,Christensen\,November END:VEVENT BEGIN:VEVENT UID:ai1ec-21063@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20211129T120000 DTEND;TZID=America/New_York:20211129T131500 LOCATION:Maryland Hall 110 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:CLSP Student Seminar URL:https://www.clsp.jhu.edu/events/clsp-student-seminar-6/ X-COST-TYPE:free END:VEVENT BEGIN:VEVENT UID:ai1ec-21068@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20211203T120000 DTEND;TZID=America/New_York:20211203T131500 LOCATION:Hackerman HallB17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Eric Ringger (Zillow Group) URL:https://www.clsp.jhu.edu/events/eric-ringger-zillow-group/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2021\,December\,Ringger END:VEVENT BEGIN:VEVENT UID:ai1ec-21072@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nEmotion has intrigued researchers for generations. Th is fascination has permeated the engineering community\, motivating the de velopment of affective computing methods. However\, human emotion remains notoriously difficult to accurately detect. As a result\, emotion classifi cation techniques are not always effective when deployed. This is a probl em because we are missing out on the potential that emotion recognition pr ovides: the opportunity to automatically measure an aspect of behavior tha t provides critical insight into our health and wellbeing\, insight that i s not always easily accessible. In this talk\, I will discuss our efforts in developing emotion recognition approaches that are effective in natura l environments and demonstrate how these approaches can be used to support mental health.\n\nBiography\n\nEmily Mower Provost is an Associate Profes sor in Computer Science and Engineering and Toyota Faculty Scholar at the University of Michigan. She received her Ph.D. in Electrical Engineering f rom the University of Southern California (USC)\, Los Angeles\, CA in 2010 . She has been awarded a National Science Foundation CAREER Award (2017)\, the Oscar Stern Award for Depression Research (2015)\, a National Science Foundation Graduate Research Fellowship (2004-2007). She is a co-author o n the paper\, “Say Cheese vs. Smile: Reducing Speech-Related Variability f or Facial Emotion Recognition\,” winner of Best Student Paper at ACM Multi media\, 2014\, and a co-author of the winner of the Classifier Sub-Challen ge event at the Interspeech 2009 emotion challenge. Her research interests are in human-centered speech and video processing\, multimodal interfaces design\, and speech-based assistive technology. The goals of her research are motivated by the complexities of the perception and expression of hum an behavior. DTSTART;TZID=America/New_York:20211206T120000 DTEND;TZID=America/New_York:20211206T131500 LOCATION:Maryland Hall 110 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Emily Mower-Provost (University of Michigan) “Automatically Measuri ng Emotion from Speech: New Methods to Move from the Lab to the Real World ” URL:https://www.clsp.jhu.edu/events/emily-mower-provost-university-of-michi gan/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAbstr act
\nNatural language processing has been revolutionized b y neural networks\, which perform impressively well in applications such a s machine translation and question answering. Despite their success\, neur al networks still have some substantial shortcomings: Their internal worki ngs are poorly understood\, and they are notoriously brittle\, failing on example types that are rare in their training data. In this talk\, I will use the unifying thread of hierarchical syntactic structure to discuss app roaches for addressing these shortcomings. First\, I will argue for a new evaluation paradigm based on targeted\, hypothesis-driven tests that bette r illuminate what models have learned\; using this paradigm\, I will show that even state-of-the-art models sometimes fail to recognize the hierarch ical structure of language (e.g.\, to conclude that “The book on the table is blue” implies “The table is blue.”) Second\, I will show how these beh avioral failings can be explained through analysis of models’ inductive bi ases and internal representations\, focusing on the puzzle of how neural n etworks represent discrete symbolic structure in continuous vector space. I will close by showing how insights from these analyses can be used to ma ke models more robust through approaches based on meta-learning\, structur ed architectures\, and data augmentation.
\nBiography
\nTom McCoy is a PhD candidate in the Department of Cognitive Sci ence at Johns Hopkins University. As an undergraduate\, he studied computa tional linguistics at Yale. His research combines natural language process ing\, cognitive science\, and machine learning to study how we can achieve robust generalization in models of language\, as this remains one of the main areas where current AI systems fall short. In particular\, he focuses on inductive biases and representations of linguistic structure\, since t hese are two of the major components that determine how learners generaliz e to novel types of input.
\n X-TAGS;LANGUAGE=en-US:2022\,January\,McCoy END:VEVENT BEGIN:VEVENT UID:ai1ec-21267@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nIn this talk\, I present a multipronged strategy for zero-shot cross-lingual Information Extraction\, that is the construction of an IE model for some target language\, given existing annotations exclu sively in some other language. This work is part of the JHU team’s effort under the IARPA BETTER program. I explore data augmentation techniques inc luding data projection and self-training\, and how different pretrained en coders impact them. We find through extensive experiments and extension of techniques that a combination of approaches\, both new and old\, leads to better performance than any one cross-lingual strategy in particular.\nBi ography\nMahsa Yarmohammadi is an assistant research scientist in CLSP\, J HU\, who leads state-of-the-art research in cross-lingual language and spe ech applications and algorithms. A primary focus of Yarmohammadi’s researc h is using deep learning techniques to transfer existing resources into ot her languages and to learn representations of language from multilingual d ata. She also works in automatic speech recognition and speech translation . Yarmohammadi received her PhD in computer science and engineering from O regon Health & Science University (2016). She joined CLSP as a post-doctor al fellow in 2017. DTSTART;TZID=America/New_York:20220204T120000 DTEND;TZID=America/New_York:20220204T131500 LOCATION:Ames 234 Presented Virtually via Zoom https://wse.zoom.us/j/967351 83473 SEQUENCE:0 SUMMARY:Mahsa Yarmohammadi (Johns Hopkins University) “Data Augmentation fo r Zero-shot Cross-Lingual Information Extraction” URL:https://www.clsp.jhu.edu/events/mahsa-yarmohammadi-johns-hopkins-univer sity-data-augmentation-for-zero-shot-cross-lingual-information-extraction/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nIn this talk\, I present a multipronged strategy for zero-shot cross-lingual Information Extraction\, that is the construction of an IE model for some target language\, given existing annotations exclu sively in some other language. This work is part of the JHU team’s effort under the IARPA BETTER program. I explore data augmentation techniques inc luding data projection and self-training\, and how different pretrained en coders impact them. We find through extensive experiments and extension of techniques that a combination of approaches\, both new and old\, leads to better performance than any one cross-lingual strategy in particular.
\nBiography
\nAbstr act
\nAs humans\, our understanding of language is grounded
in a rich mental model about “how the world works” – that we learn throug
h perception and interaction. We use this understanding to reason beyond w
hat we literally observe or read\, imagining how situations might unfold i
n the world. Machines today struggle at this kind of reasoning\, which lim
its how they can communicate with humans.
In my talk\, I will discuss three lines of work to bridge
this gap between machines and humans. I will first discuss how we might m
easure grounded understanding. I will introduce a suite of approaches for
constructing benchmarks\, using machines in the loop to filter out spuriou
s biases. Next\, I will introduce PIGLeT: a model that learns physical com
monsense understanding by interacting with the world through simulation\,
using this knowledge to ground language. From an English-language descript
ion of an event\, PIGLeT can anticipate how the world state might change –
outperforming text-only models that are orders of magnitude larger. Final
ly\, I will introduce MERLOT\, which learns about situations in the world
by watching millions of YouTube videos with transcribed speech. Through tr
aining objectives inspired by the developmental psychology idea of multimo
dal reentry\, MERLOT learns to fuse language\, vision\, and sound together
into powerful representations.
Together\, these directions suggest a path forward for building mac
hines that learn language rooted in the world.
Biography strong>
\nRowan Zellers is a final year PhD candidate at the Univers ity of Washington in Computer Science & Engineering\, advised by Yejin Cho i and Ali Farhadi. His research focuses on enabling machines to understand language\, vision\, sound\, and the world beyond these modalities. He has been recognized through an NSF Graduate Fellowship and a NeurIPS 2021 out standing paper award. His work has appeared in several media outlets\, inc luding Wired\, the Washington Post\, and the New York Times. In the past\, he graduated from Harvey Mudd College with a B.S. in Computer Science & M athematics\, and has interned at the Allen Institute for AI.
\n< /HTML> X-TAGS;LANGUAGE=en-US:2022\,February\,Zellers END:VEVENT BEGIN:VEVENT UID:ai1ec-21280@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nAs AI-driven language interfaces (such as chat-bots) become more integrated into our lives\, they need to become more versatile and reliable in their communication with human users. How can we make pro gress toward building more “general” models that are capable of understand ing a broader spectrum of language commands\, given practical constraints such as the limited availability of labeled data?\nIn this talk\, I will d escribe my research toward addressing this question along two dimensions o f generality. First I will discuss progress in “breadth” — models that add ress a wider variety of tasks and abilities\, drawing inspiration from exi sting statistical learning techniques such as multi-task learning. In part icular\, I will showcase a system that works well on several QA benchmarks \, resulting in state-of-the-art results on 10 benchmarks. Furthermore\, I will show its extension to tasks beyond QA (such as text generation or cl assification) that can be “defined” via natural language. In the second p art\, I will focus on progress in “depth” — models that can handle complex inputs such as compositional questions. I will introduce Text Modular Net works\, a general framework that casts problem-solving as natural language communication among simpler “modules.” Applying this framework to composi tional questions by leveraging discrete optimization and existing non-comp ositional closed-box QA models results in a model with strong empirical pe rformance on multiple complex QA benchmarks while providing human-readable reasoning.\nI will conclude with future research directions toward broade r NLP systems by addressing the limitations of the presented ideas and oth er missing elements needed to move toward more general-purpose interactive language understanding systems.\nBiography\nDaniel Khashabi is a postdoct oral researcher at the Allen Institute for Artificial Intelligence (AI2)\, Seattle. Previously\, he completed his Ph.D. in Computer and Information Sciences at the University of Pennsylvania in 2019. His interests lie at t he intersection of artificial intelligence and natural language processing \, with a vision toward more general systems through unified algorithms an d theories. DTSTART;TZID=America/New_York:20220218T120000 DTEND;TZID=America/New_York:20220218T131500 LOCATION:Ames Hall 234 - Presented Virtually Via Zoom https://wse.zoom.us/j /96735183473 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Daniel Khashabi (Allen Institute for Artificial Intelligence) “The Quest Toward Generality in Natural Language Understanding” URL:https://www.clsp.jhu.edu/events/daniel-khashabi-allen-institute-for-art ificial-intelligence/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAs AI-driven language interfaces (such as c hat-bots) become more integrated into our lives\, they need to become more versatile and reliable in their communication with human users. How can w e make progress toward building more “general” models that are capable of understanding a broader spectrum of language commands\, given practical co nstraints such as the limited availability of labeled data?
\nIn this talk\, I will describe my research toward addressing this ques tion along two dimensions of generality. First I will discuss progress in “breadth” — models that address a wider variety of tasks and abilities\, d rawing inspiration from existing statistical learning techniques such as m ulti-task learning. In particular\, I will showcase a system that works we ll on several QA benchmarks\, resulting in state-of-the-art results on 10 benchmarks. Furthermore\, I will show its extension to tasks beyond QA (su ch as text generation or classification) that can be “defined” via natural language. In the second part\, I will focus on progress in “depth” — mod els that can handle complex inputs such as compositional questions. I will introduce Text Modular Networks\, a general framework that casts problem- solving as natural language communication among simpler “modules.” Applyin g this framework to compositional questions by leveraging discrete optimiz ation and existing non-compositional closed-box QA models results in a mod el with strong empirical performance on multiple complex QA benchmarks whi le providing human-readable reasoning.
\nI will conclude w ith future research directions toward broader NLP systems by addressing th e limitations of the presented ideas and other missing elements needed to move toward more general-purpose interactive language understanding system s.
\nBiography
\nDaniel Khashabi is a postdoctoral researcher at the Allen Institute for Artificia l Intelligence (AI2)\, Seattle. Previously\, he completed his Ph.D. in Com puter and Information Sciences at the University of Pennsylvania in 2019. His interests lie at the intersection of artificial intelligence and natur al language processing\, with a vision toward more general systems through unified algorithms and theories.
\n X-TAGS;LANGUAGE=en-US:2022\,February\,Khashabi END:VEVENT BEGIN:VEVENT UID:ai1ec-21487@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nEnormous amounts of ever-changing knowledge are avai lable online in diverse textual styles and diverse formats. Recent advance s in deep learning algorithms and large-scale datasets are spurring progre ss in many Natural Language Processing (NLP) tasks\, including question an swering. Nevertheless\, these models cannot scale up when task-annotated t raining data are scarce. This talk presents my lab’s work toward building general-purpose models in NLP and how to systematically evaluate them. Fir st\, I present a general model for two known tasks of question answering i n English and multiple languages that are robust to small domain shifts. Then\, I show a meta-training approach that can solve a variety of NLP tas ks with only using a few examples and introduce a benchmark to evaluate cr oss-task generalization. Finally\, I discuss neuro-symbolic approaches to address more complex tasks by eliciting knowledge from structured data and language models.\n\nBiography\n\nHanna Hajishirzi is an Assistant Profess or in the Paul G. Allen School of Computer Science & Engineering at the Un iversity of Washington and a Senior Research Manager at the Allen Institut e for AI. Her research spans different areas in NLP and AI\, focusing on d eveloping general-purpose machine learning algorithms that can solve many NLP tasks. Applications for these algorithms include question answering\, representation learning\, green AI\, knowledge extraction\, and conversati onal dialogue. Honors include the NSF CAREER Award\, Sloan Fellowship\, Al len Distinguished Investigator Award\, Intel rising star award\, best pape r and honorable mention awards\, and several industry research faculty awa rds. Hanna received her PhD from University of Illinois and spent a year a s a postdoc at Disney Research and CMU. DTSTART;TZID=America/New_York:20220225T120000 DTEND;TZID=America/New_York:20220225T131500 LOCATION:Ames Hall 234 - Presented Virtually Via Zoom https://wse.zoom.us/j /96735183473 SEQUENCE:0 SUMMARY:Hanna Hajishirzi (University of Washington & Allen Institute for AI ) “Toward Robust\, Knowledge-Rich NLP” URL:https://www.clsp.jhu.edu/events/hanna-hajishirzi-university-of-washingt on-allen-institute-for-ai-toward-robust-knowledge-rich-nlp/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAbstr act
\nSince it is increasingly harder to opt out from inter acting with AI technology\, people demand that AI is capable of maintainin g contracts such that it supports agency and oversight of people who are r equired to use it or who are affected by it. To help those people create a mental model about how to interact with AI systems\, I extend the underly ing models to self-explain—predict the label/answer and explain this predi ction. In this talk\, I will present how to generate (1) free-text explana tions given in plain English that immediately tell users the gist of the r easoning\, and (2) contrastive explanations that help users understand how they could change the text to get another label.
\nBiograph y
\nAna Marasović is a postdoctoral researcher at the Allen Institute for AI (AI2) and the Paul G. Allen School of Computer Science & Engineering at University of Washington. Her research interests broadly l ie in the fields of natural language processing\, explainable AI\, and vis ion-and-language learning. Her projects are motivated by a unified goal: i mprove interaction and control of the NLP systems to help people make thes e systems do what they want with the confidence that they’re getting exact ly what they need. Prior to joining AI2\, Ana obtained her PhD from Heidel berg University.
\nHow to pronounce my name: the first name i s Ana like in Spanish\, i.e.\, with a long “a” like in “water”\; regarding the last name: “mara” as in actress mara wilson + “so” + “veetch”.
\n< /BODY> X-TAGS;LANGUAGE=en-US:2022\,February\,Marasovic END:VEVENT BEGIN:VEVENT UID:ai1ec-21621@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nSystems that support expressive\, situated natural la nguage interactions are essential for expanding access to complex computin g systems\, such as robots and databases\, to non-experts. Reasoning and l earning in such natural language interactions is a challenging open proble m. For example\, resolving sentence meaning requires reasoning not only ab out word meaning\, but also about the interaction context\, including the history of the interaction and the situated environment. In addition\, the sequential dynamics that arise between user and system in and across inte ractions make learning from static data\, i.e.\, supervised data\, both ch allenging and ineffective. However\, these same interaction dynamics resul t in ample opportunities for learning from implicit and explicit feedback that arises naturally in the interaction. This lays the foundation for sys tems that continually learn\, improve\, and adapt their language use throu gh interaction\, without additional annotation effort. In this talk\, I wi ll focus on these challenges and opportunities. First\, I will describe ou r work on modeling dependencies between language meaning and interaction c ontext when mapping natural language in interaction to executable code. In the second part of the talk\, I will describe our work on language unders tanding and generation in collaborative interactions\, focusing on continu al learning from explicit and implicit user feedback.\nBiography\nAlane Su hr is a PhD Candidate in the Department of Computer Science at Cornell Uni versity\, advised by Yoav Artzi. Her research spans natural language proc essing\, machine learning\, and computer vision\, with a focus on building systems that participate and continually learn in situated natural langua ge interactions with human users. Alane’s work has been recognized by pape r awards at ACL and NAACL\, and has been supported by fellowships and gran ts\, including an NSF Graduate Research Fellowship\, a Facebook PhD Fellow ship\, and research awards from AI2\, ParlAI\, and AWS. Alane has also co- organized multiple workshops and tutorials appearing at NeurIPS\, EMNLP\, NAACL\, and ACL. Previously\, Alane received a BS in Computer Science and Engineering as an Eminence Fellow at the Ohio State University. DTSTART;TZID=America/New_York:20220314T120000 DTEND;TZID=America/New_York:20220314T131500 LOCATION:Virtual Seminar SEQUENCE:0 SUMMARY:Alane Suhr (Cornell University) “Reasoning and Learning in Interact ive Natural Language Systems” URL:https://www.clsp.jhu.edu/events/alane-suhr-cornell-university-reasoning -and-learning-in-interactive-natural-language-systems/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nSystems that support expressive\, situated natural la nguage interactions are essential for expanding access to complex computin g systems\, such as robots and databases\, to non-experts. Reasoning and l earning in such natural language interactions is a challenging open proble m. For example\, resolving sentence meaning requires reasoning not only ab out word meaning\, but also about the interaction context\, including the history of the interaction and the situated environment. In addition\, the sequential dynamics that arise between user and system in and across inte ractions make learning from static data\, i.e.\, supervised data\, both ch allenging and ineffective. However\, these same interaction dynamics resul t in ample opportunities for learning from implicit and explicit feedback that arises naturally in the interaction. This lays the foundation for sys tems that continually learn\, improve\, and adapt their language use throu gh interaction\, without additional annotation effort. In this talk\, I wi ll focus on these challenges and opportunities. First\, I will describe ou r work on modeling dependencies between language meaning and interaction c ontext when mapping natural language in interaction to executable code. In the second part of the talk\, I will describe our work on language unders tanding and generation in collaborative interactions\, focusing on continu al learning from explicit and implicit user feedback.
\nBiog raphy
\nAlane Suhr is a PhD Candidate in the Department of Computer Science at Cornell University\, advised by Yoav Artzi. Her resea rch spans natural language processing\, machine learning\, and computer vi sion\, with a focus on building systems that participate and continually l earn in situated natural language interactions with human users. Alane’s w ork has been recognized by paper awards at ACL and NAACL\, and has been su pported by fellowships and grants\, including an NSF Graduate Research Fel lowship\, a Facebook PhD Fellowship\, and research awards from AI2\, ParlA I\, and AWS. Alane has also co-organized multiple workshops and tutorials appearing at NeurIPS\, EMNLP\, NAACL\, and ACL. Previously\, Alane receive d a BS in Computer Science and Engineering as an Eminence Fellow at the Oh io State University.
\n X-TAGS;LANGUAGE=en-US:2022\,March\,Suhr END:VEVENT BEGIN:VEVENT UID:ai1ec-21497@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nWhile the “deep learning tsunami” continues to define the state of the art in speech and language processing\, finite-state tra nsducer grammars developed by linguists and engineers are still widely use d in industrial\, highly-multilingual settings\, particularly for symbolic \, “front-end” speech applications. In this talk\, I will first briefly re view the current state of the OpenFst and OpenGrm finite-state transducer libraries. I then review two “late-breaking” algorithms found in these lib raries. The first is a heuristic but highly-effective general-purpose opti mization routine for weighted transducers. The second is an algorithm for computing the single shortest string of non-deterministic weighted accepto rs which lack certain properties required by classic shortest-path algorit hms. I will then illustrate how the OpenGrm tools can be used to induce a finite-state string-to-string transduction model known as a pair n-gram mo del. This model has been applied to grapheme-to-phoneme conversion\, loanw ord detection\, abbreviation expansion\, and back-transliteration\, among other tasks.\nBiography\nKyle Gorman is an assistant professor of linguist ics at the Graduate Center\, City University of New York\, and director of the master’s program in computational linguistics\; he is also a software engineer in the speech and language algorithms group at Google. With Rich ard Sproat\, he is the coauthor of Finite-State Text Processing (Morgan & Claypool\, 2021) and the creator of Pynini\, a finite-state text processin g library for Python. He has also published on statistical methods for com paring computational models\, text normalization\, grapheme-to-phoneme con version\, and morphological analysis\, as well as many topics in linguisti c theory. DTSTART;TZID=America/New_York:20220401T120000 DTEND;TZID=America/New_York:20220401T131500 LOCATION:Ames Hall 234 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Kyle Gorman (City University of New York) ” Weighted Finite-State T ransducers: The Later Years” URL:https://www.clsp.jhu.edu/events/kyle-gorman-city-university-of-new-york -weighted-finite-state-transducers-the-later-years/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nWhile the “deep learning tsunami” continues to define the state of the art in speech and language processing\, finite-state tra nsducer grammars developed by linguists and engineers are still widely use d in industrial\, highly-multilingual settings\, particularly for symbolic \, “front-end” speech applications. In this talk\, I will first briefly re view the current state of the OpenFst and OpenGrm finite-state transducer libraries. I then review two “late-breaking” algorithms found in these lib raries. The first is a heuristic but highly-effective general-purpose opti mization routine for weighted transducers. The second is an algorithm for computing the single shortest string of non-deterministic weighted accepto rs which lack certain properties required by classic shortest-path algorit hms. I will then illustrate how the OpenGrm tools can be used to induce a finite-state string-to-string transduction model known as a pair n-gram mo del. This model has been applied to grapheme-to-phoneme conversion\, loanw ord detection\, abbreviation expansion\, and back-transliteration\, among other tasks.
\nBiography
\nKyle Gorman is an assistant professor of linguistics at the Graduate Center\, City Universit y of New York\, and director of the master’s program in computational ling uistics\; he is also a software engineer in the speech and language algori thms group at Google. With Richard Sproat\, he is the coauthor of Finit e-State Text Processing (Morgan & Claypool\, 2021) and the creator of Pynini\, a finite-state text processing library for Python. He has also pu blished on statistical methods for comparing computational models\, text n ormalization\, grapheme-to-phoneme conversion\, and morphological analysis \, as well as many topics in linguistic theory.
\n X-TAGS;LANGUAGE=en-US:2022\,Gorman\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-22374@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nIn recent years\, the field of Natural Language Proce ssing has seen a profusion of tasks\, datasets\, and systems that facilita te reasoning about real-world situations through language (e.g.\, RTE\, MN LI\, COMET). Such systems might\, for example\, be trained to consider a s ituation where “somebody dropped a glass on the floor\,” and conclude it i s likely that “the glass shattered” as a result. In this talk\, I will dis cuss three pieces of work that revisit assumptions made by or about these systems. In the first work\, I develop a Defeasible Inference task\, which enables a system to recognize when a prior assumption it has made may no longer be true in light of new evidence it receives. The second work I wil l discuss revisits partial-input baselines\, which have highlighted issues of spurious correlations in natural language reasoning datasets and led t o unfavorable assumptions about models’ reasoning abilities. In particular \, I will discuss experiments that show models may still learn to reason i n the presence of spurious dataset artifacts. Finally\, I will touch on wo rk analyzing harmful assumptions made by reasoning models in the form of s ocial stereotypes\, particularly in the case of free-form generative reaso ning models.\nBiography\nRachel Rudinger is an Assistant Professor in the Department of Computer Science at the University of Maryland\, College Par k. She holds joint appointments in the Department of Linguistics and the I nstitute for Advanced Computer Studies (UMIACS). In 2019\, Rachel complete d her Ph.D. in Computer Science at Johns Hopkins University in the Center for Language and Speech Processing. From 2019-2020\, she was a Young Inves tigator at the Allen Institute for AI in Seattle\, and a visiting research er at the University of Washington. Her research interests include computa tional semantics\, common-sense reasoning\, and issues of social bias and fairness in NLP. DTSTART;TZID=America/New_York:20220916T120000 DTEND;TZID=America/New_York:20220916T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Rachel Rudinger (University of Maryland\, College Park) “Not So Fas t!: Revisiting Assumptions in (and about) Natural Language Reasoning” URL:https://www.clsp.jhu.edu/events/rachel-rudinger-university-of-maryland- college-park-not-so-fast-revisiting-assumptions-in-and-about-natural-langu age-reasoning/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nIn recent years\, the field of Natural Language Proce ssing has seen a profusion of tasks\, datasets\, and systems that facilita te reasoning about real-world situations through language (e.g.\, RTE\, MN LI\, COMET). Such systems might\, for example\, be trained to consider a s ituation where “somebody dropped a glass on the floor\,” and conclude it i s likely that “the glass shattered” as a result. In this talk\, I will dis cuss three pieces of work that revisit assumptions made by or about these systems. In the first work\, I develop a Defeasible Inference task\, which enables a system to recognize when a prior assumption it has made may no longer be true in light of new evidence it receives. The second work I wil l discuss revisits partial-input baselines\, which have highlighted issues of spurious correlations in natural language reasoning datasets and led t o unfavorable assumptions about models’ reasoning abilities. In particular \, I will discuss experiments that show models may still learn to reason i n the presence of spurious dataset artifacts. Finally\, I will touch on wo rk analyzing harmful assumptions made by reasoning models in the form of s ocial stereotypes\, particularly in the case of free-form generative reaso ning models.
\nBiography
\nRachel Rudinger is an Assistant Professor in the Department of Computer Science at the Unive rsity of Maryland\, College Park. She holds joint appointments in the Depa rtment of Linguistics and the Institute for Advanced Computer Studies (UMI ACS). In 2019\, Rachel completed her Ph.D. in Computer Science at Johns Ho pkins University in the Center for Language and Speech Processing. From 20 19-2020\, she was a Young Investigator at the Allen Institute for AI in Se attle\, and a visiting researcher at the University of Washington. Her res earch interests include computational semantics\, common-sense reasoning\, and issues of social bias and fairness in NLP.
\n X-TAGS;LANGUAGE=en-US:2022\,Rudinger\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-22375@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nI will present our work on data augmentation using st yle transfer as a way to improve domain adaptation in sequence labeling ta sks. The target domain is social media data\, and the task is named entity recognition (NER). The premise is that we can transform the labelled out of domain data into something that stylistically is more closely related t o the target data. Then we can train a model on a combination of the gener ated data and the smaller amount of in domain data to improve NER predicti on performance. I will show recent empirical results on these efforts.\nIf time allows\, I will also give an overview of other research projects I’m currently leading at RiTUAL (Research in Text Understanding and Analysis of Language) lab. The common thread among all these research problems is t he scarcity of labeled data.\nBiography\nThamar Solorio is a Professor of Computer Science at the University of Houston (UH). She holds graduate deg rees in Computer Science from the Instituto Nacional de Astrofísica\, Ópti ca y Electrónica\, in Puebla\, Mexico. Her research interests include info rmation extraction from social media data\, enabling technology for code-s witched data\, stylistic modeling of text\, and more recently multimodal a pproaches for online content understanding. She is the director and founde r of the RiTUAL Lab at UH. She is the recipient of an NSF CAREER award for her work on authorship attribution\, and recipient of the 2014 Emerging L eader ABIE Award in Honor of Denice Denton. She is currently serving a sec ond term as an elected board member of the North American Chapter of the A ssociation of Computational Linguistics and was PC co-chair for NAACL 2019 . She recently joined the team of Editors in Chief for the ACL Rolling Rev iew (ARR) system. Her research is currently funded by the NSF and by ADOBE . DTSTART;TZID=America/New_York:20220923T120000 DTEND;TZID=America/New_York:20220923T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Thamar Solorio (University of Houston) “Style Transfer for Data Aug mentation in Sequence Labeling Tasks” URL:https://www.clsp.jhu.edu/events/thamar-solorio-university-of-houston-st yle-transfer-for-data-augmentation-in-sequence-labeling-tasks/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nI will present our work on data a ugmentation using style transfer as a way to improve domain adaptation in sequence labeling tasks. The target domain is social media data\, and the task is named entity recognition (NER). The premise is that we can transfo rm the labelled out of domain data into something that stylistically is mo re closely related to the target data. Then we can train a model on a comb ination of the generated data and the smaller amount of in domain data to improve NER prediction performance. I will show recent empirical results o n these efforts.
\nIf time allows\, I will also give an overview of other research projects I’m currently leading at RiTUA L (Research in Text Understanding and Analysis of Language) lab. The commo n thread among all these research problems is the scarcity of labeled data .
\nBiography
\nThamar Solorio is a Professor of Computer Science at the Univer sity of Houston (UH). She holds graduate degrees in Computer Science from the Instituto Nacional de Astrofísica\, Óptica y Electrónica\, in Puebla\, Mexico. Her research interests include information extraction from social media data\, enabling technology for code-switched data\, stylistic model ing of text\, and more recently multimodal approaches for online content u nderstanding. She is the director and founder of the RiTUAL Lab at UH. She is the recipient of an NSF CAREER award for her work on authorship attrib ution\, and recipient of the 2014 Emerging Leader ABIE Award in Honor of D enice Denton. She is currently serving a second term as an elected board m ember of the North American Chapter of the Association of Computational Li nguistics and was PC co-chair for NAACL 2019. She recently joined the team of Editors in Chief for the ACL Rolling Review (ARR) system. Her research is currently funded by the NSF and by ADOBE.
\n X-TAGS;LANGUAGE=en-US:2022\,September\,Solorio END:VEVENT BEGIN:VEVENT UID:ai1ec-22380@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nThe availability of large multilingual pre-trained la nguage models has opened up exciting pathways for developing NLP technolog ies for languages with scarce resources. In this talk I will advocate for the need to go beyond the most common languages in multilingual evaluation \, and on the challenges of handling new\, unseen-during-training language s and varieties. I will also share some of my experiences with working wit h indigenous and other endangered language communities and activists.\nBio graphy\n\nAntonios Anastasopoulos is an Assistant Professor in Computer Sc ience at George Mason University. In 2019\, Antonis received his PhD in Co mputer Science from the University of Notre Dame and then worked as a post doctoral researcher at the Language Technologies Institute at Carnegie Mel lon University. His research interests revolve around computational lingui stics and natural language processing with a focus on low-resource setting s\, endangered languages\, and cross-lingual learning.\n\n\n DTSTART;TZID=America/New_York:20220930T120000 DTEND;TZID=America/New_York:20220930T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Antonios Anastasopoulos (George Mason University) “NLP Beyond the T op-100 Languages” URL:https://www.clsp.jhu.edu/events/antonis-anastasopoulos-george-mason-uni versity/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nThe availability of large multilingual pre-trained la nguage models has opened up exciting pathways for developing NLP technolog ies for languages with scarce resources. In this talk I will advocate for the need to go beyond the most common languages in multilingual evaluation \, and on the challenges of handling new\, unseen-during-training language s and varieties. I will also share some of my experiences with working wit h indigenous and other endangered language communities and activists.
\nBiography
\nAntonios Anastasopoulos is an Assistant Professor in Compu ter Science at George Mason University. In 2019\, Antonis received his PhD in Computer Science from the University of Notre Dame and then worked as a postdoctoral researcher at the Language Technologies Institute at Carneg ie Mellon University. His research interests revolve around computational linguistics and natural language processing with a focus on low-resource s ettings\, endangered languages\, and cross-lingual learning.
\n\n X-TAGS;LANGUAGE=en-US:2022\,Anastasopoulos\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-22423@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20221007T120000 DTEND;TZID=America/New_York:20221007T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Ariya Rastrow (Amazon) URL:https://www.clsp.jhu.edu/events/ariya-rastrow-amazon-2/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,October\,Rastrow END:VEVENT BEGIN:VEVENT UID:ai1ec-22394@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\n\nModel robustness and spurious correlations have rec eived increasing attention in the NLP community\, both in methods and eval uation. The term “spurious correlation” is overloaded though and can refer to any undesirable shortcuts learned by the model\, as judged by domain e xperts.\n\n\nWhen designing mitigation algorithms\, we often (implicitly) assume that a spurious feature is irrelevant for prediction. However\, man y features in NLP (e.g. word overlap and negation) are not spurious in the sense that the background is spurious for classifying objects in an image . In contrast\, they carry important information that’s needed to make pre dictions by humans. In this talk\, we argue that it is more productive to characterize features in terms of their necessity and sufficiency for pred iction. We then discuss the implications of this categorization in represe ntation\, learning\, and evaluation.\nBiography\nHe He is an Assistant Pro fessor in the Department of Computer Science and the Center for Data Scien ce at New York University. She obtained her PhD in Computer Science at the University of Maryland\, College Park. Before joining NYU\, she spent a y ear at AWS AI and was a post-doc at Stanford University before that. She i s interested in building robust and trustworthy NLP systems in human-cente red settings. Her recent research focus includes robust language understan ding\, collaborative text generation\, and understanding capabilities and issues of large language models. DTSTART;TZID=America/New_York:20221014T120000 DTEND;TZID=America/New_York:20221014T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:He He (New York University) “What We Talk about When We Talk about Spurious Correlations in NLP” URL:https://www.clsp.jhu.edu/events/he-he-new-york-university/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n
Abstr act
\nModel robustness and spuri ous correlations have received increasing attention in the NLP community\, both in methods and evaluation. The term “spurious correlation” is overlo aded though and can refer to any undesirable shortcuts learned by the mode l\, as judged by domain experts.
\nWhen designing mitigation algorithms\, we often (implicitly) assume that a spurious feature is irrelevant for prediction. However\, many features in NLP (e.g. word overlap and negation) are not spurious in the sense that the background is spurious for classifying objects in an image. In contra st\, they carry important information that’s needed to make predictions by humans. In this talk\, we argue that it is more productive to characteriz e features in terms of their necessity and sufficiency for prediction. We then discuss the implications of this categorization in representation\, l earning\, and evaluation.
\nBiography
\nHe He is an Assistant Professor in the Department of Computer Science and the C enter for Data Science at New York University. She obtained her PhD in Com puter Science at the University of Maryland\, College Park. Before joining NYU\, she spent a year at AWS AI and was a post-doc at Stanford Universit y before that. She is interested in building robust and trustworthy NLP sy stems in human-centered settings. Her recent research focus includes robus t language understanding\, collaborative text generation\, and understandi ng capabilities and issues of large language models.
\nAbstr act
\nAbstr act
\nModern learning architectures for natural language processing have been very successful in incorporating a huge amount of texts into their parameters. However\, by and large\, such models store and use knowledge in distributed and decentralized ways. This proves unreliable and makes the models ill-suited for knowledge-intensive tasks that require reasoning over factual information in linguistic expre ssions. In this talk\, I will give a few examples of exploring alternativ e architectures to tackle those challenges. In particular\, we can improve the performance of such (language) models by representing\, storing and a ccessing knowledge in a dedicated memory component.
\nThis talk is based on several joint works with Yury Zemlyanskiy (Goo gle Research)\, Michiel de Jong (USC and Google Research)\, William Cohen (Google Research and CMU) and our other collaborators in Google Research.< /p>\n
Biography
\nFei is a research scientist at Google Research. Before that\, he was a Professor of Computer Science at U niversity of Southern California. His primary research interests are machi ne learning and its application to various AI problems: speech and languag e processing\, computer vision\, robotics and recently weather forecast an d climate modeling. He has a PhD (2007) from Computer and Information Sc ience from U. of Pennsylvania and B.Sc and M.Sc in Biomedical Engineering from Southeast University (Nanjing\, China).
\n X-TAGS;LANGUAGE=en-US:2022\,October\,Sha END:VEVENT BEGIN:VEVENT UID:ai1ec-22403@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nVoice conversion (VC) is a significant aspect of arti ficial intelligence. It is the study of how to convert one’s voice to soun d like that of another without changing the linguistic content. Voice conv ersion belongs to a general technical field of speech synthesis\, which co nverts text to speech or changes the properties of speech\, for example\, voice identity\, emotion\, and accents. Voice conversion involves multiple speech processing techniques\, such as speech analysis\, spectral convers ion\, prosody conversion\, speaker characterization\, and vocoding. With t he recent advances in theory and practice\, we are now able to produce hum an-like voice quality with high speaker similarity. In this talk\, Dr. Sis man will present the recent advances in voice conversion and discuss their promise and limitations. Dr. Sisman will also provide a summary of the av ailable resources for expressive voice conversion research.\nBiography\nDr . Berrak Sisman (Member\, IEEE) received the Ph.D. degree in electrical an d computer engineering from National University of Singapore in 2020\, ful ly funded by A*STAR Graduate Academy under Singapore International Graduat e Award (SINGA). She is currently working as a tenure-track Assistant Prof essor at the Erik Jonsson School Department of Electrical and Computer Eng ineering at University of Texas at Dallas\, United States. Prior to joinin g UT Dallas\, she was a faculty member at Singapore University of Technolo gy and Design (2020-2022). She was a Postdoctoral Research Fellow at the N ational University of Singapore (2019-2020). She was an exchange doctoral student at the University of Edinburgh and a visiting scholar at The Centr e for Speech Technology Research (CSTR)\, University of Edinburgh (2019). She was a visiting researcher at RIKEN Advanced Intelligence Project in Ja pan (2018). Her research is focused on machine learning\, signal processin g\, emotion\, speech synthesis and voice conversion.\nDr. Sisman has serve d as the Area Chair at INTERSPEECH 2021\, INTERSPEECH 2022\, IEEE SLT 2022 and as the Publication Chair at ICASSP 2022. She has been elected as a me mber of the IEEE Speech and Language Processing Technical Committee (SLTC) in the area of Speech Synthesis for the term from January 2022 to Decembe r 2024. She plays leadership roles in conference organizations and active in technical committees. She has served as the General Coordinator of the Student Advisory Committee (SAC) of International Speech Communication Ass ociation (ISCA). DTSTART;TZID=America/New_York:20221104T120000 DTEND;TZID=America/New_York:20221104T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Berrak Sisman (University of Texas at Dallas) “Speech Synthesis and Voice Conversion: Machine Learning can Mimic Anyone’s Voice” URL:https://www.clsp.jhu.edu/events/berrak-sisman-university-of-texas-at-da llas/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nVoice conversion (VC) is a significant aspect of arti ficial intelligence. It is the study of how to convert one’s voice to soun d like that of another without changing the linguistic content. Voice conv ersion belongs to a general technical field of speech synthesis\, which co nverts text to speech or changes the properties of speech\, for example\, voice identity\, emotion\, and accents. Voice conversion involves multiple speech processing techniques\, such as speech analysis\, spectral convers ion\, prosody conversion\, speaker characterization\, and vocoding. With t he recent advances in theory and practice\, we are now able to produce hum an-like voice quality with high speaker similarity. In this talk\, Dr. Sis man will present the recent advances in voice conversion and discuss their promise and limitations. Dr. Sisman will also provide a summary of the av ailable resources for expressive voice conversion research.
\nDr. Berrak Sisman (Member\, IEEE) received th e Ph.D. degree in electrical and computer engineering from National Univer sity of Singapore in 2020\, fully funded by A*STAR Graduate Academy under Singapore International Graduate Award (SINGA). She is currently working a s a tenure-track Assistant Professor at the Erik Jonsson School Department of Electrical and Computer Engineering at University of Texas at Dallas\, United States. Prior to joining UT Dallas\, she was a faculty member at S ingapore University of Technology and Design (2020-2022). She was a Postdo ctoral Research Fellow at the National University of Singapore (2019-2020) . She was an exchange doctoral student at the University of Edinburgh and a visiting scholar at The Centre for Speech Technology Research (CSTR)\, U niversity of Edinburgh (2019). She was a visiting researcher at RIKEN Adva nced Intelligence Project in Japan (2018). Her research is focused on mach ine learning\, signal processing\, emotion\, speech synthesis and voice co nversion.
\nDr. Sisman has served as the Area Chair at INTERSPEECH 2 021\, INTERSPEECH 2022\, IEEE SLT 2022 and as the Publication Chair at ICA SSP 2022. She has been elected as a member of the IEEE Speech and Language Processing Technical Committee (SLTC) in the area of Speech Synthesis for the term from January 2022 to December 2024. She plays leadership roles i n conference organizations and active in technical committees. She has ser ved as the General Coordinator of the Student Advisory Committee (SAC) of International Speech Communication Association (ISCA).
\n X-TAGS;LANGUAGE=en-US:2022\,November\,Sisman END:VEVENT BEGIN:VEVENT UID:ai1ec-22408@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nAI-powered applications increasingly adopt Deep Neura l Networks (DNNs) for solving many prediction tasks\, leading to more than one DNNs running on resource-constrained devices. Supporting many models simultaneously on a device is challenging due to the linearly increased co mputation\, energy\, and storage costs. An effective approach to address t he problem is multi-task learning (MTL) where a set of tasks are learned j ointly to allow some parameter sharing among tasks. MTL creates multi-task models based on common DNN architectures and has shown significantly redu ced inference costs and improved generalization performance in many machin e learning applications. In this talk\, we will introduce our recent effor ts on leveraging MTL to improve accuracy and efficiency for edge computing . The talk will introduce multi-task architecture design systems that can automatically identify resource-efficient multi-task models with low infer ence costs and high task accuracy.\n\nBiography\n\n\nHui Guan is an Assist ant Professor in the College of Information and Computer Sciences (CICS) a t the University of Massachusetts Amherst\, the flagship campus of the UMa ss system. She received her Ph.D. in Electrical Engineering from North Car olina State University in 2020. Her research lies in the intersection betw een machine learning and systems\, with an emphasis on improving the speed \, scalability\, and reliability of machine learning through innovations i n algorithms and programming systems. Her current research focuses on both algorithm and system optimizations of deep multi-task learning and graph machine learning. DTSTART;TZID=America/New_York:20221111T120000 DTEND;TZID=America/New_York:20221111T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Hui Guan (University of Massachusetts Amherst) “Towards Accurate an d Efficient Edge Computing Via Multi-Task Learning” URL:https://www.clsp.jhu.edu/events/hui-guan-university-of-massachusetts-am herst/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAbstr act
\nDriven by the goal of eradicating language barriers o n a global scale\, machine translation has solidified itself as a key focu s of artificial intelligence research today. However\, such efforts have c oalesced around a small subset of languages\, leaving behind the vast majo rity of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe\, high-quality results\, all while ke eping ethical considerations in mind? In this talk\, I introduce No Langua ge Left Behind\, an initiative to break language barriers for low-resource languages. In No Language Left Behind\, we took on the low-resource langu age translation challenge by first contextualizing the need for translatio n support through exploratory interviews with native speakers. Then\, we c reated datasets and models aimed at narrowing the performance gap between low and high-resource languages. We proposed multiple architectural and tr aining improvements to counteract overfitting while training on thousands of tasks. Critically\, we evaluated the performance of over 40\,000 differ ent translation directions using a human-translated benchmark\, Flores-200 \, and combined human evaluation with a novel toxicity benchmark covering all languages in Flores-200 to assess translation safety. Our model achiev es an improvement of 44% BLEU relative to the previous state-of-the-art\, laying important groundwork towards realizing a universal translation syst em in an open-source manner.
\nBiography
\nAngela is a research scientist at Meta AI Research in Ne w York\, focusing on supporting efforts in speech and language research. R ecent projects include No Language Left Behind (https://ai.facebook.com/research/no-language-left-be hind/) and Universal Speech Translation for Unwritten Languages (https://ai.facebook.com/blog/ai-translation -hokkien/). Before translation\, Angela previously focused on research in on-device models for NLP and computer vision and text generation.
\n\n X-TAGS;LANGUAGE=en-US:2022\,Fan\,November END:VEVENT BEGIN:VEVENT UID:ai1ec-22417@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nOne of the keys to success in machine learning applic ations is to improve each user’s personal experience via personalized mode ls. A personalized model can be a more resource-efficient solution than a general-purpose model\, too\, because it focuses on a particular sub-probl em\, for which a smaller model architecture can be good enough. However\, training a personalized model requires data from the particular test-time user\, which are not always available due to their private nature and tech nical challenges. Furthermore\, such data tend to be unlabeled as they can be collected only during the test time\, once after the system is deploye d to user devices. One could rely on the generalization power of a generic model\, but such a model can be too computationally/spatially complex for real-time processing in a resource-constrained device. In this talk\, I w ill present some techniques to circumvent the lack of labeled personal dat a in the context of speech enhancement. Our machine learning models will r equire zero or few data samples from the test-time users\, while they can still achieve the personalization goal. To this end\, we will investigate modularized speech enhancement models as well as the potential of self-sup ervised learning for personalized speech enhancement. Because our research achieves the personalization goal in a data- and resource-efficient way\, it is a step towards a more available and affordable AI for society.\nBio graphy\nMinje Kim is an associate professor in the Dept. of Intelligent Sy stems Engineering at Indiana University\, where he leads his research grou p\, Signals and AI Group in Engineering (SAIGE). He is also an Amazon Visi ting Academic\, consulting for Amazon Lab126. At IU\, he is affiliated wit h various programs and labs such as Data Science\, Cognitive Science\, Dep t. of Statistics\, and Center for Machine Learning. He earned his Ph.D. in the Dept. of Computer Science at the University of Illinois at Urbana-Cha mpaign. Before joining UIUC\, He worked as a researcher at ETRI\, a nation al lab in Korea\, from 2006 to 2011. Before then\, he received his Master’ s and Bachelor’s degrees in the Dept. of Computer Science and Engineering at POSTECH (Summa Cum Laude) and in the Division of Information and Comput er Engineering at Ajou University (with honor) in 2006 and 2004\, respecti vely. He is a recipient of various awards including NSF Career Award (2021 )\, IU Trustees Teaching Award (2021)\, IEEE SPS Best Paper Award (2020)\, and Google and Starkey’s grants for outstanding student papers in ICASSP 2013 and 2014\, respectively. He is an IEEE Senior Member and also a membe r of the IEEE Audio and Acoustic Signal Processing Technical Committee (20 18-2023). He is serving as an Associate Editor for EURASIP Journal of Audi o\, Speech\, and Music Processing\, and as a Consulting Associate Editor f or IEEE Open Journal of Signal Processing. He is also a reviewer\, program committee member\, or area chair for the major machine learning and signa l processing. He filed more than 50 patent applications as an inventor. DTSTART;TZID=America/New_York:20221202T120000 DTEND;TZID=America/New_York:20221202T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Minje Kim (Indiana University) “Personalized Speech Enhancement: Da ta- and Resource-Efficient Machine Learning” URL:https://www.clsp.jhu.edu/events/minje-kim-indiana-university/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nOne of the keys to success in machine learning applic ations is to improve each user’s personal experience via personalized mode ls. A personalized model can be a more resource-efficient solution than a general-purpose model\, too\, because it focuses on a particular sub-probl em\, for which a smaller model architecture can be good enough. However\, training a personalized model requires data from the particular test-time user\, which are not always available due to their private nature and tech nical challenges. Furthermore\, such data tend to be unlabeled as they can be collected only during the test time\, once after the system is deploye d to user devices. One could rely on the generalization power of a generic model\, but such a model can be too computationally/spatially complex for real-time processing in a resource-constrained device. In this talk\, I will present some techniques to circumvent the lack of labeled personal data in the context of speech enhancement. Ou r machine learning models will require zero or few data samples from the t est-time users\, while they can still achieve the personalization goal. To this end\, we will investigate modularized speech enhancement models as w ell as the potential of self-supervised learning for personalized speech e nhancement. Because our research achieves the personalization goal in a da ta- and resource-efficient way\, it is a step towards a more available and affordable AI for society.
\nBiography
\nAbstr act
\nZipf’s law is commonly glossed by the aphorism “infre quent words are frequent\,” but in practice\, it has often meant that ther e are three types of words: frequent\, infrequent\, and out-of-vocabulary (OOV). Speech recognition solved the problem of frequent words in 1970 (wi th dynamic time warping). Hidden Markov models worked well for moderately infrequent words\, but the problem of OOV words was not solved until sequ ence-to-sequence neural nets de-reified the concept of a word. Many other social phenomena follow power-law distributions. The number of native sp eakers of the N’th most spoken language\, for example\, is 1.44 billion ov er N to the 1.09. In languages with sufficient data\, we have shown that monolingual pre-training outperforms multilingual pre-training. In less-f requent languages\, multilingual knowledge transfer can significantly redu ce phone error rates. In languages with no training data\, unsupervised A SR methods can be proven to converge\, as long as the eigenvalues of the l anguage model are sufficiently well separated to be measurable. Other syst ems of social categorization may follow similar power-law distributions. Disability\, for example\, can cause speech patterns that were never seen in the training database\, but not all disabilities need do so. The inabi lity of speech technology to work for people with even common disabilities is probably caused by a lack of data\, and can probably be solved by find ing better modes of interaction between technology researchers and the com munities served by technology.
\nBiography
\nMark Hasegawa-Johnson is a William L. Everitt Faculty Fellow of Electrical and Computer Engineering at the University of Illinois in Urbana-Champaig n. He has published research in speech production and perception\, source separation\, voice conversion\, and low-resource automatic speech recogni tion.
\n X-TAGS;LANGUAGE=en-US:2022\,December\,Hasegawa-Johnson END:VEVENT BEGIN:VEVENT UID:ai1ec-23302@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20230130T120000 DTEND;TZID=America/New_York:20230130T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Daniel Fried (CMU) URL:https://www.clsp.jhu.edu/events/daniel-fried-cmu/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,Fried\,January END:VEVENT BEGIN:VEVENT UID:ai1ec-23304@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nTransformers are essential to pretraining. As we appr oach 5 years of BERT\, the connection between attention as architecture an d transfer learning remains key to this central thread in NLP. Other archi tectures such as CNNs and RNNs have been used to replicate pretraining res ults\, but these either fail to reach the same accuracy or require supplem ental attention layers. This work revisits the semanal BERT result and con siders pretraining without attention. We consider replacing self-attention layers with recently developed approach for long-range sequence modeling and transformer architecture variants. Specifically\, inspired by recent p apers like the structured space space sequence model (S4)\, we use simple routing layers based on state-space models (SSM) and a bidirectional model architecture based on multiplicative gating. We discuss the results of th e proposed Bidirectional Gated SSM (BiGS) and present a range of analysis into its properties. Results show that architecture does seem to have a no table impact on downstream performance and a different inductive bias that is worth exploring further.\nBiography\nAlexander “Sasha” Rush is an Asso ciate Professor at Cornell Tech. His work is at the intersection of natura l language processing and generative modeling with applications in text ge neration\, efficient inference\, and controllability. He has written sever al popular open-source software projects supporting NLP research and data science\, and works part-time as a researcher at Hugging Face. He is the s ecretary of ICLR and developed software used to run virtual conferences du ring COVID. His work has received paper and demo awards at major NLP\, vis ualization\, and hardware conferences\, an NSF Career Award\, and a Sloan Fellowship. He tweets and blogs\, mostly about coding and ML\, at @srush_n lp. DTSTART;TZID=America/New_York:20230203T120000 DTEND;TZID=America/New_York:20230203T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Sasha Rush (Cornell University) “Pretraining Without Attention” URL:https://www.clsp.jhu.edu/events/sasha-rush-cornell-university/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nTransformers are essential to pretraining. As we appr oach 5 years of BERT\, the connection between attention as architecture an d transfer learning remains key to this central thread in NLP. Other archi tectures such as CNNs and RNNs have been used to replicate pretraining res ults\, but these either fail to reach the same accuracy or require supplem ental attention layers. This work revisits the semanal BERT result and con siders pretraining without attention. We consider replacing self-attention layers with recently developed approach for long-range sequence modeling and transformer architecture variants. Specifically\, inspired by recent p apers like the structured space space sequence model (S4)\, we use simple routing layers based on state-space models (SSM) and a bidirectional model architecture based on multiplicative gating. We discuss the results of th e proposed Bidirectional Gated SSM (BiGS) and present a range of analysis into its properties. Results show that architecture does seem to have a no table impact on downstream performance and a different inductive bias that is worth exploring further.
\nBiography
\nAbstr act
\nWhile large language models have advanced the state-o f-the-art in natural language processing\, these models are trained on lar ge-scale datasets\, which may include harmful information. Studies have sh own that as a result\, the models exhibit social biases and generate misin formation after training. In this talk\, I will discuss my work on analyzi ng and interpreting the risks of large language models across the areas of fairness\, trustworthiness\, and safety. I will first describe my researc h in the detection of dialect bias between African American English (AAE) vs. Standard American English (SAE). The second part investigates the trus tworthiness of models through the memorization and subsequent generation o f conspiracy theories. I will end my talk with recent work in AI safety re garding text that may lead to physical harm.
\nBiography
\nSharon is a 5th-year Ph.D. candidate at the University of Ca lifornia\, Santa Barbara\, where she is advised by Professor William Wang. Her research interests lie in natural language processing\, with a focus on Responsible AI. Sharon’s research spans the subareas of fairness\, trus tworthiness\, and safety\, with publications in ACL\, EMNLP\, WWW\, and LR EC. She has spent summers interning at AWS\, Meta\, and Pinterest. Sharon is a 2022 EECS Rising Star and a current recipient of the Amazon Alexa AI Fellowship for Responsible AI.
\n X-TAGS;LANGUAGE=en-US:2023\,February\,Levy END:VEVENT BEGIN:VEVENT UID:ai1ec-23308@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nBiases in datasets\, or unintentionally introduced sp urious cues\, are a common source of misspecification in machine learning. Performant models trained on such data can gender stereotype or be brittl e under distribution shift. In this talk\, we present several results in multimodal and question answering applications studying sources of dataset bias\, and several mitigation methods. We propose approaches where known dimensions of dataset bias are explicitly factored out of a model during learning\, without needing to modify data. Finally\, we ask whether datase t biases can be attributable to annotator behavior during annotation. Draw ing inspiration from work in psychology on cognitive biases\, we show cert ain behavioral patterns are highly indicative of the creation of problemat ic (but valid) data instances in question answering. We give evidence that many existing observations around how dataset bias propagates to models c an be attributed to data samples created by annotators we identify.\nBiogr aphy\nMark Yatskar is an Assistant Professor at University of Pennsylvania in the department of Computer and Information Science. He did his PhD at University of Washington co-advised by Luke Zettlemoyer and Ali Farhadi. H e was a Young Investigator at the Allen Institute for Artificial Intellige nce for several years working with their computer vision team\, Prior. His work spans Natural Language Processing\, Computer Vision\, and Fairness i n Machine Learning. He received a Best Paper Award at EMNLP for work on ge nder bias amplification\, and his work has been featured in Wired and the New York Times. DTSTART;TZID=America/New_York:20230210T120000 DTEND;TZID=America/New_York:20230210T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Mark Yatskar (University of Pennsylvania) “Understanding Dataset Bi ases: Behavioral Indicators During Annotation and Contrastive Mitigations” URL:https://www.clsp.jhu.edu/events/mark-yatskar-university-of-pennsylvania / X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nBiases in datasets\, or unintentionally introduced sp urious cues\, are a common source of misspecification in machine learning. Performant models trained on such data can gender stereotype or be brittl e under distribution shift. In this talk\, we present several results in multimodal and question answering applications studying sources of dataset bias\, and several mitigation methods. We propose approaches where known dimensions of dataset bias are explicitly factored out of a model during learning\, without needing to modify data. Finally\, we ask whether datase t biases can be attributable to annotator behavior during annotation. Draw ing inspiration from work in psychology on cognitive biases\, we show cert ain behavioral patterns are highly indicative of the creation of problemat ic (but valid) data instances in question answering. We give evidence that many existing observations around how dataset bias propagates to models c an be attributed to data samples created by annotators we identify.
\n< p>Biography\nMark Yatskar is an Assistan t Professor at University of Pennsylvania in the department of Computer an d Information Science. He did his PhD at University of Washington co-advis ed by Luke Zettlemoyer and Ali Farhadi. He was a Young Investigator at the Allen Institute for Artificial Intelligence for several years working wit h their computer vision team\, Prior. His work spans Natural Language Proc essing\, Computer Vision\, and Fairness in Machine Learning. He received a Best Paper Award at EMNLP for work on gender bias amplification\, and his work has been featured in Wired and the New York Times.
\n\n X-TAGS;LANGUAGE=en-US:2023\,February\,Yatskar END:VEVENT BEGIN:VEVENT UID:ai1ec-23314@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nWhile GPT models have shown impressive performance on summarization and open-ended text generation\, it’s important to assess t heir abilities on more constrained text generation tasks that require sign ificant and diverse rewritings. In this talk\, I will discuss the challeng es of evaluating systems that are highly competitive and perform close to humans on two such tasks: (i) paraphrase generation and (ii) text simplifi cation. To address these challenges\, we introduce an interactive Rank-and -Rate evaluation framework. Our results show that GPT-3.5 has made a major step up from fine-tuned T5 in paraphrase generation\, but still lacks the diversity and creativity of humans who spontaneously produce large quanti ties of paraphrases.\nAdditionally\, we demonstrate that GPT-3.5 performs similarly to a single human in text simplification\, which makes it diffic ult for existing automatic evaluation metrics to distinguish between the t wo. To overcome this shortcoming\, we propose LENS\, a learnable evaluatio n metric that outperforms SARI\, BERTScore\, and other existing methods in both automatic evaluation and minimal risk decoding for text generation. \nBiography\nWei Xu is an assistant professor in the School of Interactive Computing at the Georgia Institute of Technology\, where she is also affi liated with the new NSF AI CARING Institute and Machine Learning Center. S he received her Ph.D. in Computer Science from New York University and her B.S. and M.S. from Tsinghua University. Xu’s research interests are in na tural language processing\, machine learning\, and social media\, with a f ocus on text generation\, stylistics\, robustness and controllability of m achine learning models\, and reading and writing assistive technology. She is a recipient of the NSF CAREER Award\, CrowdFlower AI for Everyone Awar d\, Criteo Faculty Research Award\, and Best Paper Award at COLING’18. She has also received funds from DARPA and IARPA. She is an elected member of the NAACL executive board and regularly serves as a senior area chair for AI/NLP conferences. DTSTART;TZID=America/New_York:20230224T120000 DTEND;TZID=America/New_York:20230224T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Wei Xu (Georgia Tech) “GPT-3 vs Humans: Rethinking Evaluation of Na tural Language Generation” URL:https://www.clsp.jhu.edu/events/wei-xu-georgia-tech/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nWhile GPT mo dels have shown impressive performance on summarization and open-ended tex t generation\, it’s important to assess their abilities on more constraine d text generation tasks that require significant and diverse rewritings. I n this talk\, I will discuss the challenges of evaluating systems that are highly competitive and perform close to humans on two such tasks: (i) par aphrase generation and (ii) text simplification. To address these challeng es\, we introduce an interactive Rank-and-Rate evaluation framework. Our r esults show that GPT-3.5 has made a major step up from fine-tuned T5 in pa raphrase generation\, but still lacks the diversity and creativity of huma ns who spontaneously produce large quantities of paraphrases.
\nAdditionally\, we demon strate that GPT-3.5 performs similarly to a single human in text simplific ation\, which makes it difficult for existing automatic evaluation metrics to distinguish between the two. To overcome this shortcoming\, we propose LENS\, a learnable evaluation metric that outperforms SARI\, BERTScore\, and other existing methods in both automatic evaluation and minimal risk d ecoding for text generation.
\nBiography
\nWei Xu is an assis tant professor in the School of Interactive Computing at the Georgia Insti tute of Technology\, where she is also affiliated with the new NSF AI CARI NG Institute and Machine Learning Center. She received her Ph.D. in Comput er Science from New York University and her B.S. and M.S. from Tsinghua Un iversity. Xu’s research interests are in natural language processing\, mac hine learning\, and social media\, with a focus on text generation\, styli stics\, robustness and controllability of machine learning models\, and re ading and writing assistive technology. She is a recipient of the NSF CARE ER Award\, CrowdFlower AI for Everyone Award\, Criteo Faculty Research Awa rd\, and Best Paper Award at COLING’18. She has also received funds from D ARPA and IARPA. She is an elected member of the NAACL executive board and regularly serves as a senior area chair for AI/NLP conferences.
\n X-TAGS;LANGUAGE=en-US:2023\,February\,Xu END:VEVENT BEGIN:VEVENT UID:ai1ec-23316@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nUnderstanding the implications underlying a text is c ritical to assessing its impact\, in particular the social dynamics that m ay result from a reading of the text. This requires endowing artificial in telligence (AI) systems with pragmatic reasoning\, for example to correctl y conclude that the statement “Epidemics and cases of disease in the 21st century are “staged”” relates to unfounded conspiracy theories. In this ta lk\, I discuss how shortcomings in the ability of current AI systems to re ason about pragmatics present challenges to equitable detection of false o r harmful language. I demonstrate how these shortcomings can be addressed by imposing human-interpretable structure on deep learning architectures u sing insights from linguistics.In the first part of the talk\, I describe how adversarial text generation algorithms can be used to improve robustne ss of content moderation systems. I then introduce a pragmatic formalism f or reasoning about harmful implications conveyed by social media text. I s how how this pragmatic approach can be combined with generative neural lan guage models to uncover implications of news headlines. I also address the bottleneck to progress in text generation posed by gaps in evaluation of factuality. I conclude by showing how context-aware content moderation can be used to ensure safe interactions with conversational agents.\n \nBiogr aphy\nSaadia Gabriel is a PhD candidate in the Paul G. Allen School of Com puter Science & Engineering at the University of Washington\, advised by P rof. Yejin Choi and Prof. Franziska Roesner. Her researchrevolves around n atural language processing and machine learning\, with a particular focus on building systems for understanding how social commonsense manifests in text (i.e. how do people typically behave in social scenarios)\, as well a s mitigating spread of false or harmful text (e.g. Covid-19 misinformation ). Her work has been covered by a wide range of media outlets like Forbes and TechCrunch. It has also received a 2019 ACL best short paper nominatio n\, a 2019 IROS RoboCup best paper nomination and won a best paper award a t the 2020 WeCNLP summit. Prior to her PhD\, Saadia received a BA summa cu m laude from Mount Holyoke College in Computer Science and Mathematics.\n DTSTART;TZID=America/New_York:20230227T120000 DTEND;TZID=America/New_York:20230227T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Saadia Gabriel (University of Washington) “Socially Responsible and Factual Reasoning for Equitable AI Systems” URL:https://www.clsp.jhu.edu/events/saadia-gabriel-university-of-washington -socially-responsible-and-factual-reasoning-for-equitable-ai-systems/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nUnderstanding the implications underlying a text is c
ritical to assessing its impact\, in particular the social dynamics that m
ay result from a reading of the text. This requires endowing artificial in
telligence (AI) systems with pragmatic reasoning\, for example to correctl
y conclude that the statement “Epidemics and cases of disease in the 21st
century are “staged”” relates to unfounded conspiracy theories. In this ta
lk\, I discuss how shortcomings in the ability of current AI systems to re
ason about pragmatics present challenges to equitable detection of false o
r harmful language. I demonstrate how these shortcomings can be addressed
by imposing human-interpretable structure on deep learning architectures u
sing insights from linguistics.
In the first part of the talk\, I describe how adversarial text gen
eration algorithms can be used to improve robustness of content moderation
systems. I then introduce a pragmatic formalism for reasoning about harmf
ul implications conveyed by social media text. I show how this pragmatic a
pproach can be combined with generative neural language models to uncover
implications of news headlines. I also address the bottleneck to progress
in text generation posed by gaps in evaluation of factuality. I conclude b
y showing how context-aware content moderation can be used to ensure safe
interactions with conversational agents.
\n
Biography
\nSaadia Gabr iel is a PhD candidate in the Paul G. Allen School of Computer Scie nce & Engineering at the University of Washington\, advised by Prof. Yejin Choi and Prof. Franziska Roesner. Her research re volves around natural language processing and machine learning\, with a pa rticular focus on building systems for understanding how social commonsens e manifests in text (i.e. how do people typically behave in social scenari os)\, as well as mitigating spread of false or harmful text (e.g. Covid-19 misinformation). Her work has been covered by a wide range of media outle ts like Forbes and TechCrunch. It has also received a 2019 ACL best short paper nomination\, a 2019 IROS RoboCup best paper nomination and won a bes t paper award at the 2020 WeCNLP summit. Prior to her PhD\, Saadia received a BA summa cum laude from Mount Holyoke College in Computer Sc ience and Mathematics.
\n\n X-TAGS;LANGUAGE=en-US:2023\,February\,Gabriel END:VEVENT BEGIN:VEVENT UID:ai1ec-23320@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nSpeech communications represents a core domain for ed ucation\, team problem solving\, social engagement\, and business interact ions. The ability for Speech Technology to extract layers of knowledge and assess engagement content represents the next generation of advanced spee ch solutions. Today\, the emergence of BIG DATA\, Machine Learning\, as we ll as voice enabled speech systems have required the need for effective vo ice capture and automatic speech/speaker recognition. The ability to emplo y speech and language technology to assess human-to-human interactions off ers new research paradigms having profound impact on assessing human inter action. In this talk\, we will focus on big data naturalistic audio proces sing relating to (i) child learning spaces\, and (ii) the NASA APOLLO luna r missions. ML based technology advancements include automatic audio diari zation\, speech recognition\, and speaker recognition. Child-Teacher based assessment of conversational interactions are explored\, including keywor d and “WH-word” (e.g.\, who\, what\, etc.). Diarization processing solutio ns are applied to both classroom/learning space child speech\, as well as massive APOLLO data. CRSS-UTDallas is expanding our original Apollo-11 cor pus\, resulting in a massive multi-track audio processing challenge to mak e available 150\,000hrs of Apollo mission data to be shared with science c ommunities: (i) speech/language technology\, (ii) STEM/science and team-ba sed researchers\, and (iii) education/historical/archiving specialists. Ou r goals here are to provide resources which allow to better understand how people work/learn collaboratively together. For Apollo\, to accomplish on e of mankind’s greatest scientific/technological challenges in the last ce ntury.\nBiography\nJohn H.L. Hansen\, received Ph.D. & M.S. degrees from G eorgia Institute of Technology\, and B.S.E.E. from Rutgers Univ. He joined Univ. of Texas at Dallas (UTDallas) in 2005\, where he currently serves a s Associate Dean for Research\, Prof. of ECE\, Distinguished Univ. Chair i n Telecom. Engineering\, and directs Center for Robust Speech Systems (CRS S). He is an ISCA Fellow\, IEEE Fellow\, and has served as Member and TC-C hair of IEEE Signal Proc. Society\, Speech & Language Proc. Tech. Comm.(SL TC)\, and Technical Advisor to U.S. Delegate for NATO (IST/TG-01). He serv ed as ISCA President (2017-21)\, continues to serve on ISCA Board (2015-23 ) as Treasurer\, has supervised 99 PhD/MS thesis candidates (EE\,CE\,BME\, TE\,CS\,Ling.\,Cog.Sci.\,Spch.Sci.\,Hear.Sci)\, was recipient of 2020 UT-D allas Provost’s Award for Grad. PhD Research Mentoring\; author/co-author of 865 journal/conference papers including 14 textbooks in the field of sp eech/language/hearing processing & technology including coauthor of textbo ok Discrete-Time Processing of Speech Signals\, (IEEE Press\, 2000)\, and lead author of the report “The Impact of Speech Under ‘Stress’ on Military Speech Technology\,” (NATO RTO-TR-10\, 2000). He served as Organizer\, Ch air/Co-Chair/Tech.Chair for ISCA INTERSPEECH-2022\, IEEE ICASSP-2010\, IEE E SLT-2014\, ISCA INTERSPEECH-2002\, and Tech. Chair for IEEE ICASSP-2024. He received the 2022 IEEE Signal Processing Society Leo Beranek MERITORIO US SERVICE Award.\n DTSTART;TZID=America/New_York:20230303T120000 DTEND;TZID=America/New_York:20230303T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:John Hansen (University of Texas at Dallas) “Challenges and Advance ments in Speaker Diarization & Recognition for Naturalistic Data Streams” URL:https://www.clsp.jhu.edu/events/john-hansen-university-of-texas-at-dall as/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n
Abstr act
\nSpeech communications represents a core domain for ed ucation\, team problem solving\, social engagement\, and business interact ions. The ability for Speech Technology to extract layers of knowledge and assess engagement content represents the next generation of advanced spee ch solutions. Today\, the emergence of BIG DATA\, Machine Learning\, as we ll as voice enabled speech systems have required the need for effective vo ice capture and automatic speech/speaker recognition. The ability to emplo y speech and language technology to assess human-to-human interactions off ers new research paradigms having profound impact on assessing human inter action. In this talk\, we will focus on big data naturalistic audio proces sing relating to (i) child learning spaces\, and (ii) the NASA APOLLO luna r missions. ML based technology advancements include automatic audio diari zation\, speech recognition\, and speaker recognition. Child-Teacher based assessment of conversational interactions are explored\, including keywor d and “WH-word” (e.g.\, who\, what\, etc.). Diarization processing solutio ns are applied to both classroom/learning space child speech\, as well as massive APOLLO data. CRSS-UTDallas is expanding our original Apollo-11 cor pus\, resulting in a massive multi-track audio processing challenge to mak e available 150\,000hrs of Apollo mission data to be shared with science c ommunities: (i) speech/language technology\, (ii) STEM/science and team-ba sed researchers\, and (iii) education/historical/archiving specialists. Ou r goals here are to provide resources which allow to better understand how people work/learn collaboratively together. For Apollo\, to accomplish on e of mankind’s greatest scientific/technological challenges in the last ce ntury.
\nBiography
\nJohn H.L. Hansen\, recei ved Ph.D. & M.S. degrees from Georgia Institute of Technology\, and B.S.E. E. from Rutgers Univ. He joined Univ. of Texas at Dallas (UTDallas) in 200 5\, where he currently serves as Associate Dean for Research\, Prof. of EC E\, Distinguished Univ. Chair in Telecom. Engineering\, and directs Center for Robust Speech Systems (CRSS). He is an ISCA Fellow\, IEEE Fellow\, an d has served as Member and TC-Chair of IEEE Signal Proc. Society\, Speech & Language Proc. Tech. Comm.(SLTC)\, and Technical Advisor to U.S. Delegat e for NATO (IST/TG-01). He served as ISCA President (2017-21)\, continues to serve on ISCA Board (2015-23) as Treasurer\, has supervised 99 PhD/MS t hesis candidates (EE\,CE\,BME\,TE\,CS\,Ling.\,Cog.Sci.\,Spch.Sci.\,Hear.Sc i)\, was recipient of 2020 UT-Dallas Provost’s Award for Grad. PhD Researc h Mentoring\; author/co-author of 865 journal/conference papers including 14 textbooks in the field of speech/language/hearing processing & technolo gy including coauthor of textbook Discrete-Time Processing of Speech Signa ls\, (IEEE Press\, 2000)\, and lead author of the report “The Impact of Sp eech Under ‘Stress’ on Military Speech Technology\,” (NATO RTO-TR-10\, 200 0). He served as Organizer\, Chair/Co-Chair/Tech.Chair for ISCA INTERSPEEC H-2022\, IEEE ICASSP-2010\, IEEE SLT-2014\, ISCA INTERSPEECH-2002\, and Te ch. Chair for IEEE ICASSP-2024. He received the 2022 IEEE Signal Processin g Society Leo Beranek MERITORIOUS SERVICE Award.
\n\n X-TAGS;LANGUAGE=en-US:2023\,Hansen\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-23439@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nAs data-based technologies proliferate\, it is increa singly important for researchers to be aware of their work’s wider impact. Concerns like navigating the IRB and figuring out copyright and licensing issues are still key\, but the current focus shift to matters like inclus ivity\, fairness\, and transparency and their impact on the research/devel opment life cycle have added complexity to the research task. In this talk \, we will take a broad look at the various ways ethics intersects with na tural language processing\, machine learning\, and artificial intelligence research and discuss strategies and resources for managing these concerns within the broader research framework.\nBiography\nDenise is responsible for the overall operation of LDC’s External Relations group which includes intellectual property management\, licensing\, regulatory matters\, publi cations\, membership and communications. Before joining LDC\, she practice d law for over 20 years in the areas of international trade\, intellectual property and commercial litigation. She has an A.B. in Political Science from Bryn Mawr College and a Juris Doctor degree from the University of Mi ami School of Law. DTSTART;TZID=America/New_York:20230310T120000 DTEND;TZID=America/New_York:20230310T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street SEQUENCE:0 SUMMARY:Denise DiPersio (Linguistic Data Consortium\, University of Pennsyl vania) “Data and Ethics: Where Does the Twain Meet?” URL:https://www.clsp.jhu.edu/events/denise-dipersio-linguistic-data-consort ium-university-of-pennsylvania-data-and-ethics-where-does-the-twain-meet/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n
Abstr act
\nAs data-based technologies proliferate\, it is increa singly important for researchers to be aware of their work’s wider impact. Concerns like navigating the IRB and figuring out copyright and licensing issues are still key\, but the current focus shift to matters like inclus ivity\, fairness\, and transparency and their impact on the research/devel opment life cycle have added complexity to the research task. In this talk \, we will take a broad look at the various ways ethics intersects with na tural language processing\, machine learning\, and artificial intelligence research and discuss strategies and resources for managing these concerns within the broader research framework.
\nBiography
\nDenise is responsible for the overall operation of LDC’s External Relations group which includes intellectual property management\, licensi ng\, regulatory matters\, publications\, membership and communications. Be fore joining LDC\, she practiced law for over 20 years in the areas of int ernational trade\, intellectual property and commercial litigation. She ha s an A.B. in Political Science from Bryn Mawr College and a Juris Doctor d egree from the University of Miami School of Law.
\n X-TAGS;LANGUAGE=en-US:2023\,DiPersio\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-23312@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nAdvanced neural language models have grown ever large r and more complex\, pushing forward the limits of language understanding and generation\, while diminishing interpretability. The black-box nature of deep neural networks blocks humans from understanding them\, as well as trusting and using them in real-world applications. This talk will introd uce interpretation techniques that bridge the gap between humans and model s for developing trustworthy natural language processing(NLP). I will firs t show how to explain black-box models and evaluate their explanations for understanding their prediction behavior. Then I will introduce how to imp rove the interpretability of neural language models by making their decisi on-making transparent and rationalized. Finally\, I will discuss how to di agnose and improve models (e.g.\, robustness) through the lens of explanat ions. I will conclude with future research directions that are centered ar ound model interpretability and committed to facilitating communications a nd interactions between intelligent machines\, system developers\, and end users for long-term trustworthy AI.\nBiography\nHanjie Chen is a Ph.D. ca ndidate in Computer Science at the University of Virginia\, advised by Pro f. Yangfeng Ji. Her research interests lie in Trustworthy AI\, Natural Lan guage Processing (NLP)\, andInterpretable Machine Learning. She develops i nterpretation techniques to explain neural language models and make their prediction behavior transparent and reliable. She is a recipient of the Ca rlos and Esther Farrar Fellowship and the Best Poster Award at the ACM CAP WIC 2021. Her work has been published at top-tier NLP/AI conferences (e.g. \, ACL\, AAAI\, EMNLP\, NAACL) and selected by the National Center for Wom en & Information Technology (NCWIT) Collegiate Award Finalist 2021. She (a s the primary instructor) co-designed and taught the course\, Interpretabl e Machine Learning\, and was awarded the UVA CS Outstanding Graduate Teach ing Award and University-wide Graduate Teaching Awards Nominee (top 5% of graduate instructors). More details can be found athttps://www.cs.virginia .edu/~hc9mx DTSTART;TZID=America/New_York:20230313T120000 DTEND;TZID=America/New_York:20230313T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Hanjie Chen (University of Virginia) “Bridging Humans and Machines: Techniques for Trustworthy NLP” URL:https://www.clsp.jhu.edu/events/hanjie-chen-university-of-virginia/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAdvanced neural language models have grown ever large
r and more complex\, pushing forward the limits of language understanding
and generation\, while diminishing interpretability. The black-box nature
of deep neural networks blocks humans from understanding them\, as well as
trusting and using them in real-world applications. This talk will introd
uce interpretation techniques that bridge the gap between humans and model
s for developing trustworthy natural language processing
(NLP). I will first show how to explain black-box models and evalua
te their explanations for understanding their prediction behavior. Then I
will introduce how to improve the interpretability of neural language mode
ls by making their decision-making transparent and rationalized. Finally\,
I will discuss how to diagnose and improve models (e.g.\, robustness) thr
ough the lens of explanations. I will conclude with future research direct
ions that are centered around model interpretability and committed to faci
litating communications and interactions between intelligent machines\, sy
stem developers\, and end users for long-term trustworthy AI.
Hanjie Chen is a Ph.D. candidate in Compute r Science at the University of Virginia\, advised by Prof. Yangfeng Ji. He r research interests lie in Trustworthy AI\, Natural Language Processing ( NLP)\, and
\n X-TAGS;LANGUAGE=en-US:2023\,Chen\,February END:VEVENT BEGIN:VEVENT UID:ai1ec-23505@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nRecent advances in large pretrained language models h ave unlocked new exciting applications for Natural Language Generation for creative tasks\, such as lyrics or humour generation. In this talk we wil l discuss recent works by our team at Alexa AI and discuss current challen ges: (1) Pun understanding and generation: We release new datasets for pun understanding and the novel task of context-situated pun generation\, and demonstrate the value of our annotations for pun classification and gener ation tasks. (2) Song lyric generation: we design a hierarchical lyric gen eration framework that enables us to generate pleasantly-singable lyrics w ithout training on melody-lyric aligned data\, and show that our approach is competitive with strong baselines supervised on parallel data. (3) Crea te with Alexa: a multimodal story creation experience recently launched on Alexa devices\, which leverages story text generation models in tandem wi th story visualization and background music generation models to produce m ultimodal stories for kids.\nBiography\nAlessandra Cervone is an Applied S cientist in the Natural Understanding team at Amazon Alexa AI. Alessandra holds an MSc in Speech and Language Processing from University of Edinburg h and a PhD in CS from University of Trento (Italy). During her PhD\, Ales sandra worked on computational models of coherence in open-domain dialogue advised by Giuseppe Riccardi. In the first year of the PhD\, she was the team leader of one of the teams selected to compete in the first edition o f the Alexa Prize. More recently\, her research interests have been focuse d on natural language generation and its evaluation\, in particular in the context of creative AI applications. DTSTART;TZID=America/New_York:20230317T120000 DTEND;TZID=America/New_York:20230317T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Alessandra Cervone (Amazon) “Controllable Text Generation for Creat ive Applications URL:https://www.clsp.jhu.edu/events/alexxandra-cervone-amazon/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n Interpretable Machine Learning. She dev elops interpretation techniques to explain neural language models and make their prediction behavior transparent and reliable. She is a recipient of the Carlos and Esther Farrar Fellowship and the Best Poster Award at the ACM CAPWIC 2021. Her work has been published at top-tier NLP/AI conference s (e.g.\, ACL\, AAAI\, EMNLP\, NAACL) and selected by the National Center for Women & Information Technology (NCWIT) Collegiate Award Finalist 2021. She (as the primary instructor) co-designed and taught the course\, Inter pretable Machine Learning\, and was awarded the UVA CS Outstanding Graduat e Teaching Award and University-wide Graduate Teaching Awards Nominee (top 5% of graduate instructors). More details can be found at https://www.cs.virginia.edu/~hc9mxAbstr act
\nRecent advances in large pretrain ed language models have unlocked new exciting applications for Natural Lan guage Generation for creative tasks\, such as lyrics or humour generation. In this talk we will discuss recent works by our team at Alexa AI and dis cuss current challenges: (1) Pun understanding and generation: We release new datasets for pun understanding and the novel task of context-situated pun generation\, and demonstrate the value of our annotations for pun clas sification and generation tasks. (2) Song lyric generation: we design a hi erarchical lyric generation framework that enables us to generate pleasant ly-singable lyrics without training on melody-lyric aligned data\, and sho w that our approach is competitive with strong baselines supervised on par allel data. (3) Create with Alexa: a multimodal story creation experience recently launched on Alexa devices\, which leverages story text generation models in tandem with story visualization and background music generation models to produce multimodal stories for kids.
\nBiography< /strong>
\nAlessandra Cervone is an Applied Scientist in the Natural Understanding team at Amazon Alexa AI. Alessandra holds an MSc in Speech and Language Processing from University of Edinburgh and a PhD in CS from University of Trento (Italy). During her PhD\, Alessandra worked on comput ational models of coherence in open-domain dialogue advised by Giuseppe Ri ccardi. In the first year of the PhD\, she was the team leader of one of t he teams selected to compete in the first edition of the Alexa Prize. More recently\, her research interests have been focused on natural language g eneration and its evaluation\, in particular in the context of creative AI applications.
\n \\nAbstr act
\nDespite many recent advances in automatic speech reco gnition (ASR)\, linguists and language communities engaged in language doc umentation projects continue to face the obstacle of the “transcription bo ttleneck”. Researchers in NLP typically do not distinguish between widely spoken languages that currently happen to have few training resources and endangered languages that will never have abundant data. As a result\, we often fail to thoroughly explore when ASR is helpful for language document ation\, what architectures work best for the sorts of languages that are i n need of documentation\, and how data can be collected and organized to p roduce optimal results. In this talk I describe several projects that atte mpt to bridge the gap between the promise of ASR for language documentatio n and the reality of using this technology in real-world settings.
\nBiography
\nLearning How to Play With The Machines: Taking Stock of Where the Collaboration Between Computational and Social Science Stands
\n\n
Speakers: Jeff Gill\, Ernesto Calvo\, Hale Sirin and Antonios Anastasopoulos
\n X-TAGS;LANGUAGE=en-US:2023\,April\,APSA Roundtable END:VEVENT BEGIN:VEVENT UID:ai1ec-23588@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nAdvances in open domain Large Language Models (LLMs) starting with BERT and more recently with GPT-4\, PaLM\, and LLaMA have fa cilitated dramatic improvements in conversational systems. These improveme nts include an unprecedented breadth of conversational interactions betwee n humans and machines while maintaining and sometimes surpassing the accur acy of systems trained specifically for known\, closed domains. However\, many applications still require higher levels of accuracy than pre-trained LLMs can provide. There are many studies underway to accomplish this. Bro adly speaking\, the methods assume the pre-trained models are fixed (due t o cost/time)\, and instead look to various augmentation methods including prompting strategies and model adaptation/fine-tuning.\nOne augmentation s trategy leverages the context of the conversation. For example\, who are t he participants and what is known about these individuals (personal contex t)\, what was just said (dialogue context)\, where is the conversation tak ing place (geo context)\, what time of day and season is it (time context) \, etc. A powerful form of context is the shared visual setting of the co nversation between the human(s) and machine. The shared visual scene may b e from a device (phone\, smart glasses) or represented on a screen (browse r\, maps\, etc.) The elements in the visual context can be exploited by gr ounding the natural language conversational interaction\, thereby changing the priors of certain concepts and increasing the accuracy of the system. In this talk\, I will present some of my historical work in this area as well as my recent work in the AI Virtual Assistant (AVA) Lab at Georgia Te ch.\nBio\nDr. Larry Heck is a Professor with a joint appointment in the Sc hool of Electrical and Computer Engineering and the School of Interactive Computing at the Georgia Institute of Technology. He holds the Rhesa S. Fa rmer Distinguished Chair of Advanced Computing Concepts and is a Georgia R esearch Alliance Eminent Scholar. His received the BSEE from Texas Tech Un iversity (1986)\, and MSEE and PhD EE from the Georgia Institute of Techno logy (1989\,1991). He is a Fellow of the IEEE\, inducted into the Academy of Distinguished Engineering Alumni at Georgia Tech and received the Disti nguished Engineer Award from the Texas Tech University Whitacre College of Engineering. He was a Senior Research Engineer with SRI (1992-98)\, Vice President of R&D at Nuance (1998-2005)\, Vice President of Search and Adve rtising Sciences at Yahoo! (2005-2009)\, Chief Scientist of the Microsoft Speech products and Distinguished Engineer in Microsoft Research (2009-201 4)\, Principal Scientist with Google Research (2014-2017)\, and CEO of Viv Labs and SVP at Samsung (2017-2021).\n\n DTSTART;TZID=America/New_York:20230414T120000 DTEND;TZID=America/New_York:20230414T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Larry Heck (Georgia Institute of Technology) “The AVA Digital Human : Improving Conversational Interactions through Visually Situated Context” URL:https://www.clsp.jhu.edu/events/larry-heck-georgia-institute-of-technol ogy-the-ava-digital-human-improving-conversational-interactions-through-vi sually-situated-context/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAdvances in open domain Large Lan guage Models (LLMs) starting with BERT and more recently with GPT-4\, PaLM \, and LLaMA have facilitated dramatic improvements in conversational syst ems. These improvements include an unprecedented breadth of conversational interactions between humans and machines while maintaining and sometimes surpassing the accuracy of systems trained specifically for known\, closed domains. However\, many applications still require higher levels of accur acy than pre-trained LLMs can provide. There are many studies underway to accomplish this. Broadly speaking\, the methods assume the pre-trained mod els are fixed (due to cost/time)\, and instead look to various augmentatio n methods including prompting strategies and model adaptation/fine-tuning.
\nOne augmentation strategy leverages the conte xt of the conversation. For example\, who are the participants and what is known about these individuals (personal context)\, what was just said (di alogue context)\, where is the conversation taking place (geo context)\, w hat time of day and season is it (time context)\, etc. A powerful form of context is the shared visual setting of the conversation between the huma n(s) and machine. The shared visual scene may be from a device (phone\, sm art glasses) or represented on a screen (browser\, maps\, etc.) The elemen ts in the visual context can be exploited by grounding the natural languag e conversational interaction\, thereby changing the priors of certain conc epts and increasing the accuracy of the system. In this talk\, I will pres ent some of my historical work in this area as well as my recent work in t he AI Virtual Assistant (AVA) Lab at Georgia Tech.
\nBio
\nDr. Larry Heck is a Professor with a joi nt appointment in the School of Electrical and Computer Engineering and th e School of Interactive Computing at the Georgia Institute of Technology. He holds the Rhesa S. Farmer Distinguished Chair of Advanced Computing Con cepts and is a Georgia Research Alliance Eminent Scholar. His received the BSEE from Texas Tech University (1986)\, and MSEE and PhD EE from the Geo rgia Institute of Technology (1989\,1991). He is a Fellow of the IEEE\, in ducted into the Academy of Distinguished Engineering Alumni at Georgia Tec h and received the Distinguished Engineer Award from the Texas Tech Univer sity Whitacre College of Engineering. He was a Senior Research Engineer wi th SRI (1992-98)\, Vice President of R&D at Nuance (1998-2005)\, Vice Pres ident of Search and Advertising Sciences at Yahoo! (2005-2009)\, Chief Sci entist of the Microsoft Speech products and Distinguished Engineer in Micr osoft Research (2009-2014)\, Principal Scientist with Google Research (201 4-2017)\, and CEO of Viv Labs and SVP at Samsung (2017-2021).
\n\n
\n X-TAGS;LANGUAGE=en-US:2023\,April\,Heck END:VEVENT BEGIN:VEVENT UID:ai1ec-23590@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nMachine Translation has the ultimate goal of eliminat ing language barriers. However\, the area has focused mainly on a few lang uages\, leaving many low-resource languages without support. In this talk\ , I will discuss the challenges of bringing translation support for 200 wr itten languages and beyond.\n\nFirst\, I talk about the No Language Left B ehind Project\, where we took on this challenge by first contextualizing t he need for low-resource language translation support through exploratory interviews with native speakers. Then\, we created datasets and models aim ed at narrowing the performance gap between low and high-resource language s. We proposed multiple architectural and training improvements to counter act over-fitting while training on thousands of language-pairs/tasks. We e valuated the performance of over 40\,000 different translation directions. \n\nAfterwards\, I’ll discuss the challenges of pushing translation perfor mance beyond text for languages that don’t have written standards like Hok kien.\nOur models achieve state-of-the-art performance and lay important g roundwork towards realizing a universal translation system. At the same ti me\, we keep making open-source contributions for everyone to keep advanci ng the research for the languages they care about.\nBio\nPaco is Research Scientist Manager supporting translation teams in Meta AI (FAIR). He works in the field of machine translation with a focus on low-resource translat ion (e.g. NLLB\, FLORES) and the aim to break language barriers. He joined Meta in 2016. His research has been published in top-tier NLP venues like ACL\, EMNLP. He was the co-chair of the Research director at AMTA (2020-2 022). He has ave organized several research competitions focused on low-re source translation and data filtering. Paco obtained his PhD from the ITES M in Mexico\, was a visiting scholar at the LTI-CMU from 2008-2009\, and p articipated in DARPA’s GALE evaluation program. Paco was a post-doc and sc ientist at Qatar Computing Research Institute in Qatar in 2012-2016 DTSTART;TZID=America/New_York:20230417T120000 DTEND;TZID=America/New_York:20230417T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Paco Guzman (Meta AI) “Building a Universal Translation System to B reak Down Language Barriers” URL:https://www.clsp.jhu.edu/events/paco-guzman-meta-ai/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nOur models achieve state-of-the-art performance and lay important groundwork towards realizing a universal translation system. At the same time\, we keep maki ng open-source contributions for everyone to keep advancing the research f or the languages they care about.
\nBio
\nPac o is Research Scientist Manager supporting translation teams in Meta AI (F AIR). He works in the field of machine translation with a focus on low-res ource translation (e.g. NLLB\, FLORES) and the aim to break language barri ers. He joined Meta in 2016. His research has been published in top-tier N LP venues like ACL\, EMNLP. He was the co-chair of the Research director a t AMTA (2020-2022). He has ave organized several research competitions foc used on low-resource translation and data filtering. Paco obtained his PhD from the ITESM in Mexico\, was a visiting scholar at the LTI-CMU from 200 8-2009\, and participated in DARPA’s GALE evaluation program. Paco was a p ost-doc and scientist at Qatar Computing Research Institute in Qatar in 20 12-2016
\n X-TAGS;LANGUAGE=en-US:2023\,April\,Guzman END:VEVENT BEGIN:VEVENT UID:ai1ec-23592@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nLarge language models (LLMs) have ushered in exciting capabilities in language understanding and text generation\, with systems like ChatGPT holding fluent dialogs with users and being almost indisting uishable from humans. While this has obviously raised conversational syste ms and chatbots to a new level\, it also presents exciting new opportuniti es for building artificial agents with improved decision making capabiliti es. Specifically\, the ability to reason with language can allow us to bui ld agents that can 1) execute complex action sequences to effect change in the world\, 2) learn new skills by ‘reading’ in addition to ‘doing’\, and 3) allow for easier personalization and control over their behavior. In t his talk\, I will demonstrate how we can build such language-enabled agent s that exhibit the above traits across various use cases such as multi-hop question answering\, web interaction\, and robotic tool manipulation. In the end\, I will also discuss some dangers of using these LLM-based system s and some challenges that lie ahead in ensuring their safe use.\nBiograph y\nKarthik Narasimhan is an assistant professor in the Computer Science de partment at Princeton University and a co-Director of the Princeton NLP gr oup. His research spans the areas of natural language processing and reinf orcement learning\, with the goal of building intelligent agents that lear n to operate in the world through both their own experience (”doing things ”) and leveraging existing human knowledge (”reading about things”). Karth ik received his PhD from MIT in 2017\, and spent a year as a visiting rese arch scientist at OpenAI contributing to the GPT language model\, prior to joining Princeton in 2018. His research has been recognized by the NSF CA REER\, a Google Research Scholar Award\, an Amazon research award (2019)\, Bell Labs runner-up prize and outstanding paper awards at EMNLP (2015\, 2 016) and NeurIPS (2022). DTSTART;TZID=America/New_York:20230421T120000 DTEND;TZID=America/New_York:20230421T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Karthik Narasimhan (Princeton University) ” Towards General-Purpose Language-Enabled Agents: Machines that can Read\, Think and Act” URL:https://www.clsp.jhu.edu/events/karthik-narasimhan-princeton-university / X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nLarge language models (LLMs) have ushered in exciting capabilities in language understanding and text generation\, with systems like ChatGPT holding fluent dialogs with users and being almost indisting uishable from humans. While this has obviously raised conversational syste ms and chatbots to a new level\, it also presents exciting new opportuniti es for building artificial agents with improved decision making capabiliti es. Specifically\, the ability to reason with language can allow us to bui ld agents that can 1) execute complex action sequences to effect change in the world\, 2) learn new skills by ‘reading’ in addition to ‘doing’\, and 3) allow for easier personalization and control over their behavior. In t his talk\, I will demonstrate how we can build such language-enabled agent s that exhibit the above traits across various use cases such as multi-hop question answering\, web interaction\, and robotic tool manipulation. In the end\, I will also discuss some dangers of using these LLM-based system s and some challenges that lie ahead in ensuring their safe use.
\n< strong>Biography
\nKarthik Narasimhan is an assistan t professor in the Computer Science department at Princeton University and a co-Director of the Princeton NLP group. His research spans the areas of natural language processing and reinforcement learning\, with the goal of building intelligent agents that learn to operate in the world through bo th their own experience (”doing things”) and leveraging existing human kno wledge (”reading about things”). Karthik received his PhD from MIT in 2017 \, and spent a year as a visiting research scientist at OpenAI contributin g to the GPT language model\, prior to joining Princeton in 2018. His rese arch has been recognized by the NSF CAREER\, a Google Research Scholar Awa rd\, an Amazon research award (2019)\, Bell Labs runner-up prize and outst anding paper awards at EMNLP (2015\, 2016) and NeurIPS (2022).
\n X-TAGS;LANGUAGE=en-US:2023\,April\,Narasimhan END:VEVENT BEGIN:VEVENT UID:ai1ec-23608@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nAutomated analysis of student writing has the potenti al to provide alternatives to selected-response questions such as multiple choice\, and to enable teachers and instructors to assess students’ reaso ning skills based on their long-form writing. Further\, automated support to assess both short answers and long passages could provide students with a smoother trajectory towards mastery of written communication. Our meth ods focus on the specific ideas students express to support formative asse ssment through different kinds of feedback\, which aims to scaffold their abilities to reason and communicate. In this talk I review our work in the PSU NLP lab on methods for automated assessment of different forms of stu dent writing\, from younger and older students. I will briefly illustrate highly curated datasets created in collaboration with researchers in STEM education\, results from deployment of an older content analysis tool on middle school physics essays\, and very preliminary results on assessment of college students’ physics lab reports. I will also present our current work on short answer assessment using a novel recurrent relation network that incorporates contrastive learning.\nBio\nBecky Passonneau has been a Professor in the Department of Computer Science and Engineering at Penn St ate University since 2016\, when she joined as the first NLP researcher. S ince that time the NLP faculty has grown to include Rui Zhang and Wenpeng Yin. Becky’s research in natural language processing addresses computation al pragmatics\, meaning the investigation of language as a system of inter active behavior that serves a wide range of purposes. She received her PhD in Linguistics from the University of Chicago in 1985\, and worked at sev eral academic and industry research labs before joining Penn State. Her wo rk is reported in over 140 publications in journals and refereed conferenc e proceedings\, and has been funded through 27 sponsored projects from 16 sources\, including government agencies\, corporate sponsors\, corporate gifts\, and foundations.. DTSTART;TZID=America/New_York:20230428T120000 DTEND;TZID=America/New_York:20230428T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Becky Passonneau (Penn State University) ” Automated Support to Sca ffold Students’ Short- and Long-form STEM Writing” URL:https://www.clsp.jhu.edu/events/becky-passonneau-penn-state-university- automated-support-to-scaffold-students-short-and-long-form-stem-writing/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAutomated analysis of student writing has the potenti al to provide alternatives to selected-response questions such as multiple choice\, and to enable teachers and instructors to assess students’ reaso ning skills based on their long-form writing. Further\, automated support to assess both short answers and long passages could provide students with a smoother trajectory towards mastery of written communication. Our meth ods focus on the specific ideas students express to support formative asse ssment through different kinds of feedback\, which aims to scaffold their abilities to reason and communicate. In this talk I review our work in the PSU NLP lab on methods for automated assessment of different forms of stu dent writing\, from younger and older students. I will briefly illustrate highly curated datasets created in collaboration with researchers in STEM education\, results from deployment of an older content analysis tool on middle school physics essays\, and very preliminary results on assessment of college students’ physics lab reports. I will also present our current work on short answer assessment using a novel recurrent relation network that incorporates contrastive learning.
\nBio
\nBecky Passonneau has been a Professor in the Department of Computer Sci ence and Engineering at Penn State University since 2016\, when she joined as the first NLP researcher. Since that time the NLP faculty has grown to include Rui Zhang and Wenpeng Yin. Becky’s research in natural language p rocessing addresses computational pragmatics\, meaning the investigation o f language as a system of interactive behavior that serves a wide range of purposes. She received her PhD in Linguistics from the University of Chic ago in 1985\, and worked at several academic and industry research labs be fore joining Penn State. Her work is reported in over 140 publications in journals and refereed conference proceedings\, and has been funded through 27 sponsored projects from 16 sources\, including government agencies\, corporate sponsors\, corporate gifts\, and foundations..
\n X-TAGS;LANGUAGE=en-US:2023\,April\,Passonneau END:VEVENT BEGIN:VEVENT UID:ai1ec-23880@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20230828T120000 DTEND;TZID=America/New_York:20230828T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street SEQUENCE:0 SUMMARY:CLSP Town Hall – Welcome New Students\, Introductions and CLSP Over view URL:https://www.clsp.jhu.edu/events/clsp-town-hall-welcome-new-students-int roductions-and-clsp-overview/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,August\,Town Hall END:VEVENT BEGIN:VEVENT UID:ai1ec-23882@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nLarge language models (LLMs) have demonstrated incred ible power\, but they also possess vulnerabilities that can lead to misuse and potential attacks. In this presentation\, we will address two fundame ntal questions regarding the responsible utilization of LLMs: (1) How can we accurately identify AI-generated text? (2) What measures can safeguard the intellectual property of LLMs? We will introduce two recent watermarki ng techniques designed for text and models\, respectively. Our discussion will encompass the theoretical underpinnings that ensure the correctness o f watermark detection\, along with robustness against evasion attacks. Fur thermore\, we will showcase empirical evidence validating their effectiven ess. These findings establish a solid technical groundwork for policymaker s\, legal professionals\, and generative AI practitioners alike.\nBiograph y\nLei Li is an Assistant Professor in Language Technology Institute at Ca rnegie Mellon University. He received Ph.D. from Carnegie Mellon Universit y School of Computer Science. He is a recipient of ACL 2021 Best Paper Awa rd\, CCF Young Elite Award in 2019\, CCF distinguished speaker in 2017\, W u Wen-tsün AI prize in 2017\, and 2012 ACM SIGKDD dissertation award (runn er-up)\, and is recognized as Notable Area Chair of ICLR 2023. Previously\ , he was a faculty member at UC Santa Barbara. Prior to that\, he founded ByteDance AI Lab in 2016 and led its research in NLP\, ML\, Robotics\, an d Drug Discovery. He launched ByteDance’s machine translation system VolcT rans and AI writing system Xiaomingbot\, serving one billion users. DTSTART;TZID=America/New_York:20230901T120000 DTEND;TZID=America/New_York:20230901T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Lei Li (Carnegie Mellon University) “Empowering Responsible Use of Large Language Models” URL:https://www.clsp.jhu.edu/events/lei-li-carnegie-mellon-university-empow ering-responsible-use-of-large-language-models/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nLarge language models (LLMs) have demonstrated incred ible power\, but they also possess vulnerabilities that can lead to misuse and potential attacks. In this presentation\, we will address two fundame ntal questions regarding the responsible utilization of LLMs: (1) How can we accurately identify AI-generated text? (2) What measures can safeguard the intellectual property of LLMs? We will introduce two recent watermarki ng techniques designed for text and models\, respectively. Our discussion will encompass the theoretical underpinnings that ensure the correctness o f watermark detection\, along with robustness against evasion attacks. Fur thermore\, we will showcase empirical evidence validating their effectiven ess. These findings establish a solid technical groundwork for policymaker s\, legal professionals\, and generative AI practitioners alike.
\n< strong>Biography
\nLei Li is an Assistant Professor in Lang uage Technology Institute at Carnegie Mellon University. He received Ph.D. from Carnegie Mellon University School of Computer Science. He is a recip ient of ACL 2021 Best Paper Award\, CCF Young Elite Award in 2019\, CCF di stinguished speaker in 2017\, Wu Wen-tsün AI prize in 2017\, and 2012 ACM SIGKDD dissertation award (runner-up)\, and is recognized as Notable Area Chair of ICLR 2023. Previously\, he was a faculty member at UC Santa Barba ra. Prior to that\, he founded ByteDance AI Lab in 2016 and led its resea rch in NLP\, ML\, Robotics\, and Drug Discovery. He launched ByteDance’s m achine translation system VolcTrans and AI writing system Xiaomingbot\, se rving one billion users.
\n X-TAGS;LANGUAGE=en-US:2023\,Li\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-23886@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nThe arms race to build increasingly larger\, powerful language models (LMs) in the past year has been remarkable. Yet incorpora ting LMs effectively into practical applications that facilitate manual wo rkflows remains challenging. I will discuss LMs’ limiting factors and our efforts to overcome them. I will start with challenges surrounding efficie nt and robust LM alignment. I will share insights from our recent paper “S elf-Instruct” (ACL 2023)\, where we used vanilla (unaligned) LMs for align ing itself\, an approach that has yielded some success. Then\, I will move on to the challenge of tracing the output of LMs to reliable sources\, a weakness that makes them prone to hallucinations. I will discuss our recen t approach of ‘according-to’ prompting\, which steers LMs to quote directl y from sources observed in its pre-training. If time permits\, I will disc uss our ongoing project to adapt LMs to interact with web pages. Throughou t the presentation\, I will highlight our progress\, and end with question s about our future progress.\nBiography\nDaniel Khashabi is an assistant p rofessor in computer science at Johns Hopkins University and the Center fo r Language and Speech Processing (CLSP) member. He is interested in buildi ng reasoning-driven modular NLP systems that are robust\, transparent\, an d communicative\, particularly those that use natural language as the comm unication medium. Khashabi has published over 40 papers on natural languag e processing and AI in top-tier venues. His work touches upon developing. His research has won the ACL 2023 Outstanding Paper Award\, NAACL 2022 Bes t Paper Award\, research gifts from the Allen Institute for AI\, and an Am azon Research Award 2023. Before joining Hopkins\, he was a postdoctoral f ellow at the Allen Institute for AI (2019-2022) and obtained a Ph.D. from the University of Pennsylvania in 2019. DTSTART;TZID=America/New_York:20230908T120000 DTEND;TZID=America/New_York:20230908T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Daniel Khashabi (Johns Hopkins University) “Building More Helpful L anguage Models” URL:https://www.clsp.jhu.edu/events/daniel-khashabi-johns-hopkins-universit y/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nThe arms race to build increasingly larger\, powerful language models (LMs) in the past year has been remarkable. Yet incorpora ting LMs effectively into practical applications that facilitate manual wo rkflows remains challenging. I will discuss LMs’ limiting factors and our efforts to overcome them. I will start with challenges surrounding efficie nt and robust LM alignment. I will share insights from our recent paper “Self-Instruct” (ACL 2023)\, where we used vanilla (unaligned) LMs for aligning itself\, an approach that has yielded some success. Then\, I will move on to the challenge of t racing the output of LMs to reliable sources\, a weakness that makes them prone to hallucinations. I will discuss our recent approach of ‘according-to’ prompting\, which steers LM s to quote directly from sources observed in its pre-training. If time per mits\, I will discuss our ongoing project to adapt LMs to interact with we b pages. Throughout the presentation\, I will highlight our progress\, and end with questions about our future progress.
\nBiography strong>
\nDaniel Khashabi is an assistant professor in computer science at Johns Hopkins University and the Center for Language and Speech Pr ocessing (CLSP) member. He is interested in building reasoning-driven modu lar NLP systems that are robust\, transparent\, and communicative\, partic ularly those that use natural language as the communication medium. Khasha bi has published over 40 papers on natural language processing and AI in t op-tier venues. His work touches upon developing. His research has won the ACL 2023 Outstanding Paper Award\, NAACL 2022 Best Paper Award\, research gifts from the Allen Institute for AI\, and an Amazon Research Award 2023 . Before joining Hopkins\, he was a postdoctoral fellow at the Allen Insti tute for AI (2019-2022) and obtained a Ph.D. from the University of Pennsy lvania in 2019.
\n X-TAGS;LANGUAGE=en-US:2023\,Khashabi\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-23892@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nThe growing power in computing and AI promises a near -term future of human-machine teamwork. In this talk\, I will present my r esearch group’s efforts in understanding the complex dynamics of human-mac hine interaction and designing intelligent machines aimed to assist and co llaborate with people. I will focus on 1) tools for onboarding machine tea mmates and authoring machine assistance\, 2) methods for detecting\, and b roadly managing\, errors in collaboration\, and 3) building blocks of know ledge needed to enable ad hoc human-machine teamwork. I will also highligh t our recent work on designing assistive\, collaborative machines to suppo rt older adults aging in place.\nBiography\nChien-Ming Huang is the John C . Malone Assistant Professor in the Department of Computer Science at the Johns Hopkins University. His research focuses on designing interactive AI aimed to assist and collaborate with people. He publishes in top-tier ven ues in HRI\, HCI\, and robotics including Science Robotics\, HRI\, CHI\, a nd CSCW. His research has received media coverage from MIT Technology Revi ew\, Tech Insider\, and Science Nation. Huang completed his postdoctoral t raining at Yale University and received his Ph.D. in Computer Science at t he University of Wisconsin–Madison. He is a recipient of the NSF CAREER aw ard. https://www.cs.jhu.edu/~cmhuang/ DTSTART;TZID=America/New_York:20230915T120000 DTEND;TZID=America/New_York:20230915T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Chien-Ming Huang (Johns Hopkins University) “Becoming Teammates: De signing Assistive\, Collaborative Machines” URL:https://www.clsp.jhu.edu/events/chien-ming-huang-johns-hopkins-universi ty/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nThe growing power in computing and AI promises a near -term future of human-machine teamwork. In this talk\, I will present my r esearch group’s efforts in understanding the complex dynamics of human-mac hine interaction and designing intelligent machines aimed to assist and co llaborate with people. I will focus on 1) tools for onboarding machine tea mmates and authoring machine assistance\, 2) methods for detecting\, and b roadly managing\, errors in collaboration\, and 3) building blocks of know ledge needed to enable ad hoc human-machine teamwork. I will also highligh t our recent work on designing assistive\, collaborative machines to suppo rt older adults aging in place.
\nBiography
\nChien-Ming Huang is the John C. Malone Assistant Professor in the Departm ent of Computer Science at the Johns Hopkins University. His research focu ses on designing interactive AI aimed to assist and collaborate with peopl e. He publishes in top-tier venues in HRI\, HCI\, and robotics including S cience Robotics\, HRI\, CHI\, and CSCW. His research has received media co verage from MIT Technology Review\, Tech Insider\, and Science Nation. Hua ng completed his postdoctoral training at Yale University and received his Ph.D. in Computer Science at the University of Wisconsin–Madison. He is a recipient of the NSF CAREER award. https://www .cs.jhu.edu/~cmhuang/
\n X-TAGS;LANGUAGE=en-US:2023\,Huang\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-23894@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nThe use of NLP in the realm of financial technology i s broad and complex\, with applications ranging from sentiment analysis an d named entity recognition to question answering. Large Language Models (L LMs) have been shown to be effective on a variety of tasks\; however\, no LLM specialized for the financial domain has been reported in the literatu re. In this work\, we present BloombergGPT\, a 50 billion parameter langua ge model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg’s extensive data sources\, p erhaps the largest domain-specific dataset yet\, augmented with 345 billio n tokens from general-purpose datasets. We validate BloombergGPT on stand ard LLM benchmarks\, open financial benchmarks\, and a suite of internal b enchmarks that most accurately reflect our intended usage. Our mixed datas et training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general L LM benchmarks. Additionally\, we explain our modeling choices\, training p rocess\, and evaluation methodology.\nBiography\nMark Dredze is the John C Malone Professor of Computer Science at Johns Hopkins University and the Director of Research (Foundations of AI) for the JHU AI-X Foundry. He deve lops Artificial Intelligence Systems based on natural language processing and explores applications to public health and medicine.\nProf. Dredze is affiliated with the Malone Center for Engineering in Healthcare\, the Cent er for Language and Speech Processing\, among others. He holds a joint app ointment in the Biomedical Informatics & Data Science Section (BIDS)\, und er the Department of Medicine (DOM)\, Division of General Internal Medicin e (GIM) in the School of Medicine. He obtained his PhD from the University of Pennsylvania in 2009. DTSTART;TZID=America/New_York:20230918T120000 DTEND;TZID=America/New_York:20230918T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Mark Dredze (Johns Hopkins University) “BloombergGPT: A Large Langu age Model for Finance” URL:https://www.clsp.jhu.edu/events/mark-dredze-johns-hopkins-university/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nThe use of NLP in the realm of financial technology i s broad and complex\, with applications ranging from sentiment analysis an d named entity recognition to question answering. Large Language Models (L LMs) have been shown to be effective on a variety of tasks\; however\, no LLM specialized for the financial domain has been reported in the literatu re. In this work\, we present BloombergGPT\, a 50 billion parameter langua ge model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg’s extensive data sources\, p erhaps the largest domain-specific dataset yet\, augmented with 345 billio n tokens from general-purpose datasets. We validate BloombergGPT on stand ard LLM benchmarks\, open financial benchmarks\, and a suite of internal b enchmarks that most accurately reflect our intended usage. Our mixed datas et training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general L LM benchmarks. Additionally\, we explain our modeling choices\, training p rocess\, and evaluation methodology.
\nBiography
\nMark Dredze is the John C Malone Professor of Computer Science at Jo hns Hopkins University and the Director of Research (Foundations of AI) fo r the JHU AI-X Foundry. He develops Artificial Intelligence Systems based on natural language processing and explores applications to public health and medicine.
\nProf. Dredze is affiliated with the Malone Center fo r Engineering in Healthcare\, the Center for Language and Speech Processin g\, among others. He holds a joint appointment in the Bio medical Informatics & Data Science Section (< span class='il'>BIDS)\, under the Department of Medicine (DOM)\, Di vision of General Internal Medicine (GIM) in the School of Medicine. He ob tained his PhD from the University of Pennsylvania in 2009.
\n HTML> X-TAGS;LANGUAGE=en-US:2023\,Dredze\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-23983@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nVisually rich documents (scanned or digital) remain i mportant for many consumer and business use cases. During this talk we wil l share recent work from our team in the Document Intelligence Lab of Adob e Research to understand\, create\, and interact with these documents. Fi rst\, we’ll share a series of work on building models to decompose and und erstand the structure of documents to support use cases around document an alysis and accessibility. Next\, we’ll explore document semantic understan ding for a project where we convert natural language contract clauses to c ode to support business automation. Finally\, we’ll discuss DocEdit\, a mo del and dataset that enables editing structured documents from natural lan guage. \nBIOS:\nRajiv Jain is a Senior Research Scientist in the Document Intelligence Lab in Adobe Research\, where his research focuses on underst anding the layout\, content\, and interaction with documents. Prior to joi ning Adobe\, Rajiv was a consultant at DARPA\, where he worked on the Medi a Forensics Program to secure digital imagery. He previously served for 10 years as a researcher for the Department of Defense where he worked on pr ojects around large scale systems\, computer vision\, and network security . He received his PhD in computer science from the University of Maryland\ , College Park working in the field of document image analysis and retriev al.\nChris Tensmeyer primarily focuses on multi-modal document layout and content understanding as a Research Scientist in the Document Intelligence Lab of Adobe Research. Since joining Adobe 5 years ago\, his work has di rectly impacted popular Adobe features such as mobile Acrobat Liquid Mode\ , PDF table extraction\, handwriting recognition\, and scanned document de tection. Other research interests include general Computer Vision and Dee p Learning. He received his PhD in Computer Science from Brigham Young Un iversity on the topic of Deep Learning for Document Image Analysis. DTSTART;TZID=America/New_York:20230922T120000 DTEND;TZID=America/New_York:20230922T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Rajiv Jain and Chris Tensmeyer (Adobe) “Document Intelligence at Ad obe Research” URL:https://www.clsp.jhu.edu/events/rajiv-jain-and-chris-tensmeyer-adobe-do cument-intelligence-at-adobe-research/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nVisually rich document s (scanned or digital) remain important for many consumer and business use cases. During this talk we will sha re recent work from our team in the Document Intelligence Lab of Adobe Res earch to understand\, create\, and interact with these documents. First\, we’ll share a series of work on building models to decompose and understa nd the structure of documents to support use cases around document analysi s and accessibility. Next\, we’ll explore document semantic understanding for a project where we convert natural language contract clauses to code t o support business automation. Finally\, we’ll discuss DocEdit\, a model a nd dataset that enables editing structured documents from natural language .
\nBIOS:
\nRajiv Jain is a Senior Research Scientist in the Do cument Intelligence Lab in Adobe Research\, where his research focuses on understanding the layout\, content\, and interaction with documents. Prior to joining Adobe\, Rajiv was a consultant at DARPA\, where he worked on t he Media Forensics Program to secure digital imagery. He previously served for 10 years as a researcher for the Department of Defense where he worke d on projects around large scale systems\, computer vision\, and network s ecurity. He received his PhD in computer science from the University of Ma ryland\, College Park working in the field of document image analysis and retrieval.
\nChris Ten smeyer primarily focuses on multi-modal document layout and conte nt understanding as a Research Scientist in the Document Intelligence Lab of Adobe Research. Since joining Adobe 5 years ago\, his work has directl y impacted popular Adobe features such as mobile Acrobat Liquid Mode\, PDF table extraction\, handwriting recognition\, and scanned document detecti on. Other research interests include general Computer Vision and Deep Lea rning. He received his PhD in Computer Science from Brigham Young Univers ity on the topic of Deep Learning for Document Image Analysis.
\n X-TAGS;LANGUAGE=en-US:2023\,Jain and Tensmeyer\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-23896@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nThe field of NLP is in the midst of a disruptive shif t\, fueled most recently by the advent of large language models (LLMs)\, w ith impacts on our methodologies\, funding and public perception. While th e core technologies and scope of real-world impact of our field may be cha nging (everything is different!)\, many of the same key challenges faced s ince the inception of our field remain (nothing has changed). In this talk I’ll describe recent work characterizing and tackling some of these chall enges\, notably: data-efficient domain adaptation and lifelong learning. I will also anchor discussion of cycles and shifts in the field by describi ng findings from a qualitative study of factors shaping the community over time\, including culture\, incentives\, and infrastructure. Through these complementary lenses into the past\, present and future\, I aim to inspir e shared hope\, excitement and discussion. \nBio\nEmma Strubell is the Raj Reddy Assistant Professor in the Language Technologies Institute in the S chool of Computer Science at Carnegie Mellon University\, and a Visiting S cientist at the Allen Institute for Artificial Intelligence. Previously sh e held research scientist roles at Google and FAIR after earning her docto ral degree in 2019 from the University of Massachusetts Amherst. Her resea rch lies at the intersection of natural language processing and machine le arning\, with a focus on providing pragmatic solutions to practitioners wh o wish to gain insights from natural language text via computation- and da ta-efficient AI. Her work has been recognized with a Madrona AI Impact Awa rd\, best paper awards at ACL and EMNLP\, and cited in news outlets includ ing the New York Times and Wall Street Journal. DTSTART;TZID=America/New_York:20230925T120000 DTEND;TZID=America/New_York:20230925T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Emma Strubell (Carnegie Mellon University) “Large Language Models: Everything’s Different and Nothing Has Changed” URL:https://www.clsp.jhu.edu/events/emma-strubell-carnegie-mellon-universit y/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nThe field of NLP i s in the midst of a disruptive shift\, fueled most recently by the advent of large language models (LLMs)\, with impacts on our methodologies\, fund ing and public perception. While the core technologies and scope of real-w orld impact of our field may be changing (everything is different!)\, many of the same key challenges faced since the inception of our field remain (nothing has changed). In this talk I’ll describe recent work characterizi ng and tackling some of these challenges\, notably: data-efficient domain adaptation and lifelong learning. I will also anchor discussion of cycles and shifts in the field by describing findings from a qualitative study of factors shaping the community over time\, including culture\, incentives\ , and infrastructure. Through these complementary lenses into the past\, p resent and future\, I aim to inspire shared hope\, excitement and discussi on.
\nBio
\n< span class='x_x_x_ContentPasted1'>Emma Strubell is the Raj Reddy Assistant Professor in the Language Technologies Institute in the School of Compute r Science at Carnegie Mellon University\, and a Visiting Scientist at the Allen Institute for Artificial Intelligence. Previously she held research scientist roles at Google and FAIR after earning her doctoral degree in 20 19 from the University of Massachusetts Amherst. Her research lies at the intersection of natural language processing and machine learning\, with a focus on providing pragmatic solutions to practitioners who wish to gain i nsights from natural language text via computation- and data-efficient AI. Her work has been recognized with a Madrona AI Impact Award\, best paper awards at ACL and EMNLP\, and cited in news outlets including the New York Times and Wall Street Journal.
\n X-TAGS;LANGUAGE=en-US:2023\,September\,Strubell END:VEVENT BEGIN:VEVENT UID:ai1ec-24005@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nLarge-scale generative models such as GPT and DALL-E have revolutionized natural language processing and computer vision resear ch. These models not only generate high fidelity text or image outputs\, b ut also demonstrate impressive domain and task generalization capabilities . In contrast\, audio generative models are relatively primitive in scale and generalization.\nIn this talk\, I will start with a brief introduction on conventional neural speech generative models and discuss why they are unfit for scaling to Internet-scale data. Next\, by reviewing the latest l arge-scale generative models for text and image\, I will outline a few lin es of promising approaches to build scalable speech models. Last\, I will present Voicebox\, our latest work to advance this area. Voicebox is the m ost versatile generative model for speech. It is trained with a simple tas k — text conditioned speech infilling — on over 50K hours of multilingual speech with a powerful flow-matching objective. Through in-context learnin g\, Voicebox can perform monolingual/cross-lingual zero-shot TTS\, holisti c style conversion\, transient noise removal\, content editing\, and diver se sample generation. Moreover\, Voicebox achieves state-of-the-art perfor mance and excellent run-time efficiency.\nBiography\nWei-Ning Hsu is a res earch scientist at Meta Foundational AI Research (FAIR) and currently the lead of the audio generation team. His research focuses on self-supervised learning and generative models for speech and audio. His pioneering work includes HuBERT\, AV-HuBERT\, TextlessNLP\, data2vec\, wav2vec-U\, textles s speech translation\, and Voicebox. \nPrior to joining Meta\, Wei-Ning wo rked at MERL and Google Brain as a research intern. He received his Ph.D. and S.M. degrees in Electrical Engineering and Computer Science from Massa chusetts Institute of Technology in 2020 and 2018\, under the supervision of Dr. James Glass. He received his B.S. degree in Electrical Engineering from National Taiwan University in 2014\, under the supervision of Prof. L in-shan Lee and Prof. Hsuan-Tien Lin. DTSTART;TZID=America/New_York:20231009T120000 DTEND;TZID=America/New_York:20231009T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Wei-Ning Hsu (Meta Foundational AI Research) “Large Scale Universal Speech Generative Models” URL:https://www.clsp.jhu.edu/events/wei-ning-hsu-meta-foundational-ai-resea rch-large-scale-universal-speech-generative-models/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nLarge-scale generative models such as GPT and DALL-E have revolutionized natural langu age processing and computer vision research. These models not only generat e high fidelity text or image outputs\, but also demonstrate impressive do main and task generalization capabilities. In contrast\, audio generative models are relatively primitive in scale and generalization.
\nIn this talk\, I will st art with a brief introduction on conventional neural speech generative mod els and discuss why they are unfit for scaling to Internet-scale data. Nex t\, by reviewing the latest large-scale generative models for text and ima ge\, I will outline a few lines of promising approaches to build scalable speech models. Last\, I will present Voicebox\, our latest work to advance this area. Voicebox is the most versatile generative model for speech. It is trained with a simple task — text conditioned speech infilling — on ov er 50K hours of multilingual speech with a powerful flow-matching objectiv e. Through in-context learning\, Voicebox can perform monolingual/cross-li ngual zero-shot TTS\, holistic style conversion\, transient noise removal\ , content editing\, and diverse sample generation. Moreover\, Voicebox ach ieves state-of-the-art performance and excellent run-time efficiency.
\nBiography
\nWei-Ning Hsu is a research scientist at Meta Founda tional AI Research (FAIR) and currently the lead of the audio generation t eam. His research focuses on self-supervised learning and generative model s for speech and audio. His pioneering work includes HuBERT\, AV-HuBERT\, TextlessNLP\, data2vec\, wav2vec-U\, textless speech translation\, and Voi cebox.
\nPri or to joining Meta\, Wei-Ning worked at MERL and Google Brain as a researc h intern. He received his Ph.D. and S.M. degrees in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2020 a nd 2018\, under the supervision of Dr. James Glass. He received his B.S. d egree in Electrical Engineering from National Taiwan University in 2014\, under the supervision of Prof. Lin-shan Lee and Prof. Hsuan-Tien Lin.
\n X-TAGS;LANGUAGE=en-US:2023\,Hsu\,October END:VEVENT BEGIN:VEVENT UID:ai1ec-23902@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nPretrained language models (LMs) encode implicit repr esentations of knowledge in their parameters. Despite this observation\, o ur best methods for interpreting these representations yield few actionabl e insights on how to manipulate this parameter space for downstream benefi t. In this talk\, I will present work on methods that simulate machine rea soning by localizing and modifying parametric knowledge representations. F irst\, I will present a method for discovering knowledge-critical subnetwo rks within pretrained language models\, and show that these sparse computa tional subgraphs are responsible for the model’s ability to encode specifi c pieces of knowledge. Then\, I will present a new reasoning algorithm\, R ECKONING\, a bi-level optimisation procedure that dynamically encodes and reasons over new knowledge at test-time using the model’s existing learned knowledge representations as a scratchpad. Finally\, I will discuss next steps and challenges in using internal model mechanisms for reasoning.\n\n Bio\n\nAntoine Bosselut is an assistant professor in the School of Compute r and Communication Sciences at the École Polytechnique Fédéral de Lausann e (EPFL). He was a postdoctoral scholar at Stanford University and a Young Investigator at the Allen Institute for AI (AI2). He completed his PhD at the University of Washington and was a student researcher at Microsoft Re search. His research interests are in building systems that mix knowledge and language representations to solve problems in NLP\, specializing in co mmonsense representation and reasoning. DTSTART;TZID=America/New_York:20231013T120000 DTEND;TZID=America/New_York:20231013T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Antoine Bosselut (EPFL) “From Mechanistic Interpretability to Mecha nistic Reasoning” URL:https://www.clsp.jhu.edu/events/antoine-bosselut-epfl/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAbstr act
\nRecent advances in speech technology make heavy use o f pre-trained models that learn from large quantities of raw (untranscribe d) speech\, using “self-supervised” (ie unsupervised) learning. These mode ls learn to transform the acoustic input into a different representational format that makes supervised learning (for tasks such as transcription or even translation) much easier. However\, *what* and *how* speech-relevant information is encoded in these representations is not well understood. I will talk about some work at various stages of completion in which my gro up is analyzing the structure of these representations\, to gain a more sy stematic understanding of how word-level\, phonetic\, and speaker informat ion is encoded.
\nBiography
\nSharon Goldwate
r is a Professor in the Institute for Language\, Cognition and Computation
at the University of Edinburgh’s School of Informatics. She received her
PhD in 2007 from Brown University and spent two years as a postdoctoral re
searcher at Stanford University before moving to Edinburgh. Her research i
nterests include unsupervised and minimally-supervised learning for speech
and language processing\, computer modelling of language acquisition in c
hildren\, and computational studies of language use. Her main focus withi
n linguistics has been on the lower levels of structure including phonetic
s\, phonology\, and morphology.
Prof. Goldwater has received awards including the 2016 Roger Needha
m Award from the British Computer Society for “distinguished research cont
ribution in computer science by a UK-based researcher who has completed up
to 10 years of post-doctoral research.” She has served on the editorial b
oards of several journals\, including Computational Linguistics\, Transact
ions of the Association for Computational Linguistics\, and the inaugural
board of OPEN MIND: Advances in Cognitive Science. She was a program chair
for the EACL 2014 Conference and chaired the EACL governing board from 20
19-2020.
Abstr act
\nAbstr act
\nIn this talk\, I will present a simple extension of i mage-based Masked Autoencoders (MAE) to self-supervised representation lea rning from audio spectrograms. Following the Transformer encoder-decoder d esign in MAE\, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio\, feeding only the non-masked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens\, in order to reconstruct the input spectrogram. We find it beneficial to incorporate local window attention in the decoder\, as au dio spectrograms are highly correlated in local time and frequency bands. We then fine-tune the encoder with a lower masking ratio on target dataset s. Empirically\, Audio-MAE sets new state-of-the-art performance on six au dio and speech classification tasks\, outperforming other recent models th at use external supervised pre-training.
\nBio
\nFlorian Metze is a Research Scientist Manager at Meta AI in New York\ , supporting a team of researchers and engineers working on multi-modal (i mage\, video\, audio\, text) content understanding for Meta’s Family of Ap ps (Instagram\, Threads\, Facebook\, WhatsApp). He used to be an Associate Research Professor at Carnegie Mellon University\, in the School of Compu ter Science’s Language Technologies Institute\, where he still is an Adjun ct Professor. He is also a co-founder of Abridge\, a company working on ex tracting information from doctor patient conversations. His work covers ma ny areas of speech recognition and multi-media analysis with a focus on en d-to-end deep learning. Currently\, he focuses on multi-modal processing o f videos\, and using that information to recommend unconnected content. In the past\, he has worked on low resource and multi-lingual speech process ing\, speech recognition with articulatory features\, large-scale multi-me dia retrieval and summarization\, information extraction from medical inte rviews\, and recognition of personality or similar meta-data from speech.< /p>\n
For more information\, please see http://www.cs.cmu.edu/directory/fmetze
\n\n X-TAGS;LANGUAGE=en-US:2023\,Metze\,November END:VEVENT BEGIN:VEVENT UID:ai1ec-24163@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nThe almost unlimited multimedia content available on video-sharing websites has opened new challenges and opportunities for bui lding robust multimodal solutions. This seminar will describe our novel mu ltimodal architectures that (1) are robust to missing modalities\, (2) can identify noisy or less discriminative features\, and (3) can leverage unl abeled data. First\, we present a strategy that effectively combines auxil iary networks\, a transformer architecture\, and an optimized training mec hanism for handling missing features. This problem is relevant since it is expected that during inference the multimodal system will face cases with missing features due to noise or occlusion. We implement this approach fo r audiovisual emotion recognition achieving state-of-the-art performance. Second\, we present a multimodal framework for dealing with scenarios char acterized by noisy or less discriminative features. This situation is comm only observed in audiovisual automatic speech recognition (AV-ASR) with cl ean speech\, where the performance often drops compared to a speech-only s olution due to the variability of visual features. The proposed approach i s a deep learning solution with a gating layer that diminishes the effect of noisy or uninformative visual features\, keeping only useful informatio n. The approach improves\, or at least\, maintains performance when visual features are used. Third\, we discuss alternative strategies to leverage unlabeled multimodal data. A promising approach is to use multimodal prete xt tasks that are carefully designed to learn better representations for p redicting a given task\, leveraging the relationship between acoustic and facial features. Another approach is using multimodal ladder networks wher e intermediate representations are predicted across modalities using later al connections. These models offer principled solutions to increase the ge neralization and robustness of common speech-processing tasks when using m ultimodal architectures. \nBio\nCarlos Busso is a Professor at the Univers ity of Texas at Dallas’s Electrical and Computer Engineering Department\, where he is also the director of the Multimodal Signal Processing (MSP) La boratory. His research interest is in human-centered multimodal machine in telligence and application\, with a focus on the broad areas of affective computing\, multimodal human-machine interfaces\, in-vehicle active safety systems\, and machine learning methods for multimodal processing. He has worked on audio-visual emotion recognition\, analysis of emotional modulat ion in gestures and speech\, designing realistic human-like virtual charac ters\, and detection of driver distractions. He is a recipient of an NSF C AREER Award. In 2014\, he received the ICMI Ten-Year Technical Impact Awar d. In 2015\, his student received the third prize IEEE ITSS Best Dissertat ion Award (N. Li). He also received the Hewlett Packard Best Paper Award a t the IEEE ICME 2011 (with J. Jain)\, and the Best Paper Award at the AAAC ACII 2017 (with Yannakakis and Cowie). He received the Best of IEEE Trans actions on Affective Computing Paper Collection in 2021 (with R. Lotfian) and the Best Paper Award from IEEE Transactions on Affective Computing in 2022 (with Yannakakis and Cowie). He received the ACM ICMI Community Servi ce Award in 2023. In 2023\, he received the Distinguished Alumni Award in the Mid-Career/Academia category by the Signal and Image Processing Instit ute (SIPI) at the University of Southern California. He is currently servi ng as an associate editor of the IEEE Transactions on Affective Computing. He is an IEEE Fellow. He is a member of the ISCA\, and AAAC and a senior member of ACM. DTSTART;TZID=America/New_York:20231117T120000 DTEND;TZID=America/New_York:20231117T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Carlos Busso (University of Texas at Dallas) “Multimodal Machine Le arning for Human-Centric Tasks” URL:https://www.clsp.jhu.edu/events/carl-busso-university-of-texas-at-dalla s-multimodal-machine-learning-for-human-centric-tasks/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n
Abstr act
\nThe almost unlimited multimedia content available on video-sharing websites has opened new challenges and opportun ities for building robust multimodal solutions. This seminar will describe our novel multimodal architectures that (1) are robust to missing modalit ies\, (2) can identify noisy or less discriminative features\, and (3) can leverage unlabeled data. First\, we present a strategy that effectively c ombines auxiliary networks\, a transformer architecture\, and an optimized training mechanism for handling missing features. This problem is relevan t since it is expected that during inference the multimodal system will fa ce cases with missing features due to noise or occlusion. We implement thi s approach for audiovisual emotion recognition achieving state-of-the-art performance. Second\, we present a multimodal framework for dealing with s cenarios characterized by noisy or less discriminative features. This situ ation is commonly observed in audiovisual automatic speech recognition (AV -ASR) with clean speech\, where the performance often drops compared to a speech-only solution due to the variability of visual features. The propos ed approach is a deep learning solution with a gating layer that diminishe s the effect of noisy or uninformative visual features\, keeping only usef ul information. The approach improves\, or at least\, maintains performanc e when visual features are used. Third\, we discuss alternative strategies to leverage unlabeled multimodal data. A promising approach is to use mul timodal pretext tasks that are carefully designed to learn better represen tations for predicting a given task\, leveraging the relationship between acoustic and facial features. Another approach is using multimodal ladder networks where intermediate representations are predicted across modalitie s using lateral connections. These models offer principled solutions to in crease the generalization and robustness of common speech-processing tasks when using multimodal architectures.
\nBio
\nCarlos Busso is a Professor at the University of Tex as at Dallas’s Electrical and Computer Engineering Department\, where he i s also the director of the Multimodal Signal Processing (MSP) Laboratory. His research interest is in human-centered multimodal machine intelligence and application\, with a focus on the broad areas of affective computing\ , multimodal human-machine interfaces\, in-vehicle active safety systems\, and machine learning methods for multimodal processing. He has worked on audio-visual emotion recognition\, analysis of emotional modulation in ges tures and speech\, designing realistic human-like virtual characters\, and detection of driver distractions. He is a recipient of an NSF CAREER Awar d. In 2014\, he received the ICMI Ten-Year Technical Impact Award. In 2015 \, his student received the third prize IEEE ITSS Best Dissertation Award (N. Li). He also received the Hewlett Packard Best Paper Award at the IEEE ICME 2011 (with J. Jain)\, and the Best Paper Award at the AAAC ACII 2017 (with Yannakakis and Cowie). He received the Best of IEEE Transactions on Affective Computing Paper Collection in 2021 (with R. Lotfian) and the Be st Paper Award from IEEE Transactions on Affective Computing in 2022 (with Yannakakis and Cowie). He received the ACM ICMI Community Service Award i n 2023. In 2023\, he received the Distinguished Alumni Award in the Mid-Ca reer/Academia category by the Signal and Image Processing Institute (SIPI) at the University of Southern California. He is currently serving as an a ssociate editor of the IEEE Transactions on Affective Computing. He is an IEEE Fellow. He is a member of the ISCA\, and AAAC and a senior member of ACM.
\n X-TAGS;LANGUAGE=en-US:2023\,Busso\,November END:VEVENT BEGIN:VEVENT UID:ai1ec-24167@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nPre-trained speech representation models have become ubiquitous in speech processing over the past few years. They have both i mproved the state of the art and made it feasible to learn task-specific m odels with very little labeled data. However\, it is not well understood what linguistic information is encoded in pre-trained models and how best to apply them to downstream tasks. In this talk I will describe recent wor k that begins to build an understanding of the layer-wise information lear ned by pre-trained speech models. We consider a number of popular pre-tra ined models and investigate the extent to which their layers encode spectr al\, phonetic\, and word-level information. The results of these analyses also suggest some ways to improve or simplify the application of pre-trai ned models for downstream tasks. Finally\, I will describe our efforts to benchmark model performance on a variety of spoken language understanding tasks\, in order to broaden our understanding of the capabilities of stat e-of-the-art models.\nThis talk is based in part on work presented in\nA. Pasad et al.\, “Comparative layer-wise analysis of self-supervised speech models\,”ICASSP 2023.\nA. Pasad et al.\, “What do self-supervised speech m odels know about words?\,” arXiv:2307.00162\, 2023.\nS. Shon et al.\, “SLU E Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Task s\,” ACL 2023.\nBio\nKaren Livescu is a Professor at TTI-Chicago. She comp leted her PhD at MIT in 2005. She is an ISCA Fellow and a recent IEEE Dist inguished Lecturer. She has served as a program chair/co-chair for ICLR\, Interspeech\, and ASRU\, and is an Associate Editor for TACL and IEEE T-P AMI. Her group’s work spans a variety of topics in spoken\, written\, and signed language processing. DTSTART;TZID=America/New_York:20231201T120000 DTEND;TZID=America/New_York:20231201T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Karen Livescu (Toyota Technological Institute at Chicago) “What Do Pre-Trained Speech Representation Models Know? Layer-Wise Analysis and Ben chmarking” URL:https://www.clsp.jhu.edu/events/karen-livescu-toyota-technological-inst itute-at-chicago/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nPre-trained speech representation models have become ubiquitous in speech processing over the past few years. They have both i mproved the state of the art and made it feasible to learn task-specific m odels with very little labeled data. However\, it is not well understood what linguistic information is encoded in pre-trained models and how best to apply them to downstream tasks. In this talk I will describe recent wor k that begins to build an understanding of the layer-wise information lear ned by pre-trained speech models. We consider a number of popular pre-tra ined models and investigate the extent to which their layers encode spectr al\, phonetic\, and word-level information. The results of these analyses also suggest some ways to improve or simplify the application of pre-trai ned models for downstream tasks. Finally\, I will describe our efforts to benchmark model performance on a variety of spoken language understanding tasks\, in order to broaden our understanding of the capabilities of stat e-of-the-art models.
\nThis talk is based in part on work presented in
\nA. Pasad et al.\, “C omparative layer-wise analysis of self-supervised speech models\,”ICAS SP 2023.
\nA. Pasad et al.\, “What do self-supervised speech models know about words?\,” ar Xiv:2307.00162\, 2023.
\nS. Shon et al.\, “SLUE Phase-2: A Benchmark Suite of Diverse Spo ken Language Understanding Tasks\,” ACL 2023.
\nBio
\nKaren Livescu is a Professor at TTI-Chicago. She completed he r PhD at MIT in 2005. She is an ISCA Fellow and a recent IEEE Distinguishe d Lecturer. She has served as a program chair/co-chair for ICLR\, Intersp eech\, and ASRU\, and is an Associate Editor for TACL and IEEE T-PAMI. He r group’s work spans a variety of topics in spoken\, written\, and signed language processing.
\n X-TAGS;LANGUAGE=en-US:2023\,December\,Livescu END:VEVENT BEGIN:VEVENT UID:ai1ec-24169@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nFoundation models\, including Chat-GPT and its many v ariants\, have come into prominence in the natural language processing (NL P) community thanks the ubiquity of text data readily available on the int ernet and the design of modern transformer architectures that can effectiv ely learn from such data. However\, the development of a foundation model for sequential decision-making (e.g.\, reinforcement learning\, planning) is faced with additional challenges not present in NLP. In this talk\, we discuss some of these challenges with the hope of informing future investm ents that funding agencies and the academic community should engage in. Th e problem of transfer learning in the context of sequential decision-makin g is also discussed and constitutes one of the challenges that foundation models must address.\nBio\nAlvaro Velasquez a program manager at the Defen se Advanced Research Projects Agency (DARPA)\, where he currently leads pr ograms on neuro-symbolic AI. Before that\, Alvaro oversaw the machine inte lligence portfolio for the Information Directorate of the Air Force Resear ch Laboratory (AFRL). Alvaro is a recipient of the distinguished paper awa rd from AAAI and best paper and patent awards from AFRL\, the National Sci ence Foundation Graduate Research Fellowship. He has authored over 70 pape rs and two patents and serves as Associate Editor of the IEEE Transactions on Artificial Intelligence. DTSTART;TZID=America/New_York:20231204T120000 DTEND;TZID=America/New_York:20231204T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Alvaro Velasquez (DARPA) “Foundation Models and the Transfer of Emb odied Autonomy” URL:https://www.clsp.jhu.edu/events/alvaro-velasquez/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nFoundation models\, including Chat-GPT and its many v ariants\, have come into prominence in the natural language processing (NL P) community thanks the ubiquity of text data readily available on the int ernet and the design of modern transformer architectures that can effectiv ely learn from such data. However\, the development of a foundation model for sequential decision-making (e.g.\, reinforcement learning\, planning) is faced with additional challenges not present in NLP. In this talk\, we discuss some of these challenges with the hope of informing future investm ents that funding agencies and the academic community should engage in. Th e problem of transfer learning in the context of sequential decision-makin g is also discussed and constitutes one of the challenges that foundation models must address.
\nBio
\nAlvaro Velasquez a program manager at the Defense Advanced Research Pr ojects Agency (DARPA)\, where he currently leads programs on neuro-symboli c AI. Before that\, Alvaro oversaw the machine intelligence portfolio for the Information Directorate of the Air Force Research Laboratory (AFRL). A lvaro is a recipient of the distinguished paper award from AAAI and best p aper and patent awards from AFRL\, the National Science Foundation Graduat e Research Fellowship. He has authored over 70 papers and two patents and serves as Associate Editor of the IEEE Transactions on Artificial Intellig ence.
\n X-TAGS;LANGUAGE=en-US:2023\,December\,Velasquez END:VEVENT BEGIN:VEVENT UID:ai1ec-24239@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nNon-invasive neural interfaces have the potential to transform human-computer interaction by providing users with low friction\ , information rich\, always available inputs. Reality Labs at Meta is deve loping such an interface for the control of augmented reality devices base d on electromyographic (EMG) signals captured at the wrist. Speech and aud io technologies turn out to be especially well suited to unlocking the ful l potential of these signals and interactions and this talk will present s everal specific problems and the speech and audio approaches that have adv anced us towards this ultimate goal of effortless and joyful interfaces. W e will provide the necessary neuroscientific background to understand thes e signals\, describe automatic speech recognition-inspired interfaces gene rating text and beamforming-inspired interfaces for identifying individual neurons\, and then explain how they connect with egocentric machine intel ligence tasks that might reside on these devices.\nBiography\nMichael I Ma ndel is a Research Scientist in Reality Labs at Meta. Previously\, he was an Associate Professor of Computer and Information Science at Brooklyn Col lege and the CUNY Graduate Center working at the intersection of machine l earning\, signal processing\, and psychoacoustics. He earned his BSc in Co mputer Science from the Massachusetts Institute of Technology and his MS a nd PhD with distinction in Electrical Engineering from Columbia University as a Fu Foundation Presidential Scholar. He was an FQRNT Postdoctoral Res earch Fellow in the Machine Learning laboratory (LISA/MILA) at the Univers ité de Montréal\, an Algorithm Developer at Audience Inc\, and a Research Scientist in Computer Science and Engineering at the Ohio State University . His work has been supported by the National Science Foundation\, includi ng via a CAREER award\, the Alfred P. Sloan Foundation\, and Google\, Inc. DTSTART;TZID=America/New_York:20240129T120000 DTEND;TZID=America/New_York:20240129T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Michael I Mandel (Meta) “Speech and Audio Processing in Non-Invasiv e Brain-Computer Interfaces at Meta” URL:https://www.clsp.jhu.edu/events/michael-i-mandel-cuny/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nNon-invasive neural interfaces ha ve the potential to transform human-computer interaction by providing user s with low friction\, information rich\, always available inputs. Reality Labs at Meta is developing such an interface for the control of augmented reality devices based on electromyographic (EMG) signals captured at the w rist. Speech and audio technologies turn out to be especially well suited to unlocking the full potential of these signals and interactions and this talk will present several specific problems and the speech and audio appr oaches that have advanced us towards this ultimate goal of effortless and joyful interfaces. We will provide the necessary neuroscientific backgroun d to understand these signals\, describe automatic speech recognition-insp ired interfaces generating text and beamforming-inspired interfaces for id entifying individual neurons\, and then explain how they connect with egoc entric machine intelligence tasks that might reside on these devices.
\nBiography
\nMichael I Mandel is a Research Sci entist in Reality Labs at Meta. Previously\, he was an Associate Professor of Computer and Information Science at Brooklyn College and the CUNY Grad uate Center working at the intersection of machine learning\, signal proce ssing\, and psychoacoustics. He earned his BSc in Computer Science from th e Massachusetts Institute of Technology and his MS and PhD with distinctio n in Electrical Engineering from Columbia University as a Fu Foundation Pr esidential Scholar. He was an FQRNT Postdoctoral Research Fellow in the Ma chine Learning laboratory (LISA/MILA) at the Université de Montréal\, an A lgorithm Developer at Audience Inc\, and a Research Scientist in Computer Science and Engineering at the Ohio State University. His work has been su pported by the National Science Foundation\, including via a CAREER award\ , the Alfred P. Sloan Foundation\, and Google\, Inc.
\n X-TAGS;LANGUAGE=en-US:2024\,January\,Mandel END:VEVENT BEGIN:VEVENT UID:ai1ec-24241@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nOur research focuses on improving speech processing a lgorithms\, such as automatic speech recognition (ASR)\, speaker identific ation\, and depression detection\, under challenging conditions such as li mited data (for example\, children’s or clinical speech)\, mismatched cond itions (for example\, training on read speech while recognizing conversati onal speech)\, and noisy speech\, using a hybrid data-driven and knowledge -based approach. This approach requires understanding of both machine lear ning approaches and of the human speech production and perception systems. I will summarize in this talk our work on children’s ASR using self-super vised models\, detecting depression from speech signals using novel speake r disentaglement techniques\, and automating scoring of children’s reading tasks with both ASR and innovative NLP algorithms.\nBiography\nAbeer Alwa n received her Ph.D. in Electrical Engineering and Computer Science from M IT in 1992. Since then\, she has been with the ECE department at UCLA wher e she is now a Full Professor and directs the Speech Processing and Audito ry Perception Laboratory. She is the recipient of the NSF Research Initiat ion and Career Awards\, NIH FIRST Award\, UCLA-TRW Excellence in Teaching Award\, Okawa Foundation Award in Telecommunication\, and the Engineer’s C ouncil Educator Award. She is a Fellow of the Acoustical Society of Americ a\, IEEE\, and International Speech Communication Assoc. (ISCA). She was a Fellow at the Radcliffe Institute\, Harvard University\, co-Editor in Chi ef of Speech Communication\, Associate Editor of JASA and IEEE TSALP\, a D istinguished Lecturer of ISCA\, a member of the IEEE Signal Processing Boa rd of Governers and she is currently on the advisory board of ISCA and the UCLA-Amazon Science Hub for Humanity and AI. DTSTART;TZID=America/New_York:20240202T120000 DTEND;TZID=America/New_York:20240202T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Abeer Alwan (UCLA) “Dealing with Limited Speech Data and Variabilit y: Three case studies” URL:https://www.clsp.jhu.edu/events/abeer-alwan-ucla/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nOur research focuses on improving speech processing a lgorithms\, such as automatic speech recognition (ASR)\, speaker identific ation\, and depression detection\, under challenging conditions such as li mited data (for example\, children’s or clinical speech)\, mismatched cond itions (for example\, training on read speech while recognizing conversati onal speech)\, and noisy speech\, using a hybrid data-driven and knowledge -based approach. This approach requires understanding of both machine lear ning approaches and of the human speech production and perception systems. I will summarize in this talk our work on children’s ASR using self-super vised models\, detecting depression from speech signals using novel speake r disentaglement techniques\, and automating scoring of children’s reading tasks with both ASR and innovative NLP algorithms.
\nBiogra phy
\nAbeer Alwan received her Ph.D. in Electrical Engineer ing and Computer Science from MIT in 1992. Since then\, she has been with the ECE department at UCLA where she is now a Full Professor and directs t he Speech Processing and Auditory Perception Laboratory. She is the recipi ent of the NSF Research Initiation and Career Awards\, NIH FIRST Award\, U CLA-TRW Excellence in Teaching Award\, Okawa Foundation Award in Telecommu nication\, and the Engineer’s Council Educator Award. She is a Fellow of t he Acoustical Society of America\, IEEE\, and International Speech Communi cation Assoc. (ISCA). She was a Fellow at the Radcliffe Institute\, Harvar d University\, co-Editor in Chief of Speech Communication\, Associate Edit or of JASA and IEEE TSALP\, a Distinguished Lecturer of ISCA\, a member of the IEEE Signal Processing Board of Governers and she is currently on the advisory board of ISCA and the UCLA-Amazon Science Hub for Humanity and A I.
\n X-TAGS;LANGUAGE=en-US:2024\,Alwan\,February END:VEVENT BEGIN:VEVENT UID:ai1ec-24423@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nThere is an enormous data gap between how AI systems and children learn language: The best LLMs now learn language from text wi th a word count in the trillions\, whereas it would take a child roughly 1 00K years to reach those numbers through speech (Frank\, 2023\, “Bridging the data gap”). There is also a clear generalization gap: whereas machines struggle with systematic generalization\, people excel. For instance\, on ce a child learns how to “skip\,” they immediately know how to “skip twice ” or “skip around the room with their hands up” due to their compositional skills. In this talk\, I’ll describe two case studies in addressing these gaps:\n1) The data gap: We train deep neural networks from scratch (using DINO\, CLIP\, etc.)\, not on large-scale data from the web\, but through the eyes and ears of a single child. Using head-mounted video recordings f rom a child (61 hours of video slices over 19 months)\, we show how deep n eural networks can acquire many word-referent mappings\, generalize to nov el visual referents\, and achieve multi-modal alignment. Our results demon strate how today’s AI models are capable of learning key aspects of childr en’s early knowledge from realistic input.\n2) The generalization gap: Can neural networks capture human-like systematic generalization? We address a 35-year-old debate catalyzed by Fodor and Pylyshyn’s classic article\, w hich argued that standard neural networks are not viable models of the min d because they lack systematic compositionality — the algebraic ability to understand and produce novel combinations from known components. We’ll sh ow how neural network can achieve human-like systematic generalization whe n trained through meta-learning for compositionality (MLC)\, a new method for optimizing the compositional skills of neural networks through practic e. With MLC\, a neural network can match human performance and solve sever al machine learning benchmarks.\nGiven this work\, we’ll discuss the paths forward for building machines that learn\, generalize\, and interact in m ore human-like ways based on more natural input.\nRelated articles:\nVong\ , W. K.\, Wang\, W.\, Orhan\, A. E.\, and Lake\, B. M (2024). Grounded lan guage acquisition through the eyes and ears of a single child. Science\, 3 83.\nOrhan\, A. E.\, and Lake\, B. M. (in press). Learning high-level visu al representations from a child’s perspective without strong inductive bia ses. Nature Machine Intelligence.\nLake\, B. M. and Baroni\, M. (2023). Hu man-like systematic generalization through a meta-learning neural network. Nature\, 623\, 115-121.\nBiography\nBrenden M. Lake is an Assistant Profe ssor of Psychology and Data Science at New York University. He received hi s M.S. and B.S. in Symbolic Systems from Stanford University in 2009\, and his Ph.D. in Cognitive Science from MIT in 2014. He was a postdoctoral Da ta Science Fellow at NYU from 2014-2017. Brenden is a recipient of the Rob ert J. Glushko Prize for Outstanding Doctoral Dissertation in Cognitive Sc ience\, he is a MIT Technology Review Innovator Under 35\, and his researc h was selected by Scientific American as one of the 10 most important adva nces of 2016. Brenden’s research focuses on computational problems that ar e easier for people than they are for machines\, such as learning new conc epts\, creating new concepts\, learning-to-learn\, and asking questions. DTSTART;TZID=America/New_York:20240212T120000 DTEND;TZID=America/New_York:20240212T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street SEQUENCE:0 SUMMARY:Brendan Lake (New York University) “Towards More Human-Like Learnin g in Machines: Bridging the Data and Generalization Gaps” URL:https://www.clsp.jhu.edu/events/brendan-lake-new-york-university-toward s-more-human-like-learning-in-machines-bridging-the-data-and-generalizatio n-gaps/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nThere is an enormous data gap between how AI systems and children learn language: The best LLMs now learn language from text with a word count in the trillions\, whereas it would take a child roughly 100K years to reach those numbers through speec h (Frank\, 2023\, “Bridging the data gap”). There is also a clear generali zation gap: whereas machines struggle with systematic generalization\, peo ple excel. For instance\, once a child learns how to “skip\,” they immedia tely know how to “skip twice” or “skip around the room with their hands up ” due to their compositional skills. In this talk\, I’ll describe two case studies in addressing these gaps:
\n1) The dat a gap: We train deep neural networks from scratch (using DINO\, CLIP\, etc .)\, not on large-scale data from the web\, but through the eyes and ears of a single child. Using head-mounted video recordings from a child (61 ho urs of video slices over 19 months)\, we show how deep neural networks can acquire many word-referent mappings\, generalize to novel visual referent s\, and achieve multi-modal alignment. Our results demonstrate how today’s AI models are capable of learning key aspects of children’s early knowled ge from realistic input.
\n2) The generalizatio n gap: Can neural networks capture human-like systematic generalization? W e address a 35-year-old debate catalyzed by Fodor and Pylyshyn’s classic a rticle\, which argued that standard neural networks are not viable models of the mind because they lack systematic compositionality — the algebraic ability to understand and produce novel combinations from known components . We’ll show how neural network can achieve human-like systematic generali zation when trained through meta-learning for compositionality (MLC)\, a n ew method for optimizing the compositional skills of neural networks throu gh practice. With MLC\, a neural network can match human performance and s olve several machine learning benchmarks.
\nGiv en this work\, we’ll discuss the paths forward for building machines that learn\, generalize\, and interact in more human-like ways based on more na tural input.
\nRelated articles:
\nVong\, W. K.\, Wang\, W.\, Orhan\, A. E.\, and Lake\, B. M (2024). Grounded language acquisition through the eyes and ears of a singl e child. Science\, 383.
\nOrhan\, A. E.\ , and Lake\, B. M. (in press). Learning high-level visual representations from a child’s perspective without strong inductive biases. Nature Mach ine Intelligence.
\nLake\, B. M. and Baroni \, M. (2023). Human-like systematic generalization through a meta-learning neural network. Nature\, 623\, 115-121.
\nBiography< /strong>
\nBrenden M. Lake is an Assistant Prof essor of Psychology and Data Science at New York University. He received h is M.S. and B.S. in Symbolic Systems from Stanford University in 2009\, an d his Ph.D. in Cognitive Science from MIT in 2014. He was a postdoctoral D ata Science Fellow at NYU from 2014-2017. Brenden is a recipient of the Ro bert J. Glushko Prize for Outstanding Doctoral Dissertation in Cognitive S cience\, he is a MIT Technology Review Innovator Under 35\, and his resear ch was selected by Scientific American as one of the 10 most important adv ances of 2016. Brenden’s research focuses on computational problems that a re easier for people than they are for machines\, such as learning new con cepts\, creating new concepts\, learning-to-learn\, and asking questions.< /p>\n
\\nAbstr act
\nLarge language models like ChatGPT have shown extraor dinary abilities for writing. While impressive at first glance\, large lan guage models aren’t perfect and often make mistakes humans would not make. The main architecture behind ChatGPT mostly doesn’t differ from early neu ral networks\, and as a consequence\, carries some of the same limitations . My work revolves around the use of neural networks like ChatGPT mixed wi th symbolic methods from early AI and how these two families of methods ca n combine to create more robust AI. I talk about some of the neurosymbolic methods I used for applications in story generation and understanding — w ith the goal of eventually creating AI that can play Dungeons & Dragons. I also discuss pain points that I found for improving accessible communicat ion and show how large language models can supplement such communication.< /p>\n
Biography
\nAbstr act
\nI discuss the application of Foundation Models in Ast ronomy through the collaborative efforts of the UniverseTBD consortium wit h a mission to democratize Science for everyone. One of our key objectives is to overcome the limitations of general-purpose Foundation Models\, suc h as producing limited information in specialized fields. To this end\, we have developed the first specialized large language model for Astronomy\, AstroLLaMa-1. This model\, enhanced by exposure to domain-specific litera ture from the NASA Astrophysics Data System and ArXiv\, demonstrates impro ved text completion and embedding capabilities over existent GPT models. I further discuss the potential of LLMs in generating complex scientific hy potheses and extracting meaningful insights from astronomy literature. Our findings\, validated by human experts\, demonstrate the LLM capability in informed scientific critique and uncover intriguing patterns in the embed ding space\, highlighting the potential of LLMs to augment scientific inqu iry. I will also discuss preliminary work with the multi-modal model Astro LLaVA\, which allows us to interact with astronomical images via natural l anguage. Through the work of UniverseTBD\, we aim to explore how artificia l intelligence can assist human intelligence in Astronomy and\, more broad ly\, Science.
\nBiography
\nIoana Ciucă\, who goes by Jo\, is an interdisciplinary Jubilee Joint Fellow at the Australi an National University\, working across the School of Computing and the Re search School of Astronomy & Astrophysics. Before joining ANU\, Jo finishe d her PhD in Astrophysics at University College London in the United Kingd om\, where she worked at the intersection of Astronomy and Machine Learnin g to understand the formation and evolution history of our Galaxy\, the Mi lky Way. Jo is now focusing on utilizing foundation models that benefit re searchers everywhere\, working alongside the UniverseTBD team of more than 30 astronomers\, engineers\, ML practitioners and enthusiasts worldwide.< /p>\n
\n X-TAGS;LANGUAGE=en-US:2024\,Ciuca\,February END:VEVENT BEGIN:VEVENT UID:ai1ec-24459@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20240301T120000 DTEND;TZID=America/New_York:20240301T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Mohit Iyyer “Improving\, Evaluating and Detecting Long-Form LLM-Gen erated Text” URL:https://www.clsp.jhu.edu/events/mohit-iyyer-improving-evaluating-and-de tecting-long-form-llm-generated-text/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2024\,Iyyer\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-24465@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nLarge Language Models (LLMs) have demonstrated remark able capabilities across various domains. However\, it is still very chall enging to build highly-reliable applications with LLMs that support specia lized use cases. LLMs trained on web data often excel at capturing general language patterns\, but they could struggle to support specialized domain s and personalized user needs. Moreover\, LLMs can produce errors that are deceptively plausible\, making them potentially dangerous for high-trust scenarios. In this talk\, I will discuss some of our recent efforts in add ressing these challenges with data-efficient tuning methods and a novel fa ctuality evaluation framework. Specifically\, my talk will focus on buildi ng multilingual applications\, one crucial use case often characterized by limited tuning and evaluation data.\nBio\nXinyi(Cindy) Wang is a research scientist at Google DeepMind working on Large Language Models(LLM) and it s application to generative question-answering. She has worked on multilin gual instruction-tuning for Gemini and multilingual generative models used in Google search. Before Google DeepMind\, Cindy Wang obtained her PhD de gree in Language Technologies at Carnegie Mellon University. During her Ph D\, she mainly worked on developing data-efficient natural language proces sing~(NLP) systems. She has made several contributions in data selection\, data representation\, and model adaptation for multilingual NLP. DTSTART;TZID=America/New_York:20240308T120000 DTEND;TZID=America/New_York:20240308T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Cindy Wang (Google DeepMind) “Building Data-Efficient and Reliable Applications with Large Language Models” URL:https://www.clsp.jhu.edu/events/cindy-wang-google-deepmind-building-dat a-efficient-and-reliable-applications-with-large-language-models/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n
Abstr act
\nLarge Language Models (LLMs) have demonstrated remark able capabilities across various domains. However\, it is still very chall enging to build highly-reliable applications with LLMs that support specia lized use cases. LLMs trained on web data often excel at capturing general language patterns\, but they could struggle to support specialized domain s and personalized user needs. Moreover\, LLMs can produce errors that are deceptively plausible\, making them potentially dangerous for high-trust scenarios. In this talk\, I will discuss some of our recent efforts in add ressing these challenges with data-efficient tuning methods and a novel fa ctuality evaluation framework. Specifically\, my talk will focus on buildi ng multilingual applications\, one crucial use case often characterized by limited tuning and evaluation data.
\nBio
\nXinyi(Cindy) Wang is a research scientist at Google DeepMind working on La rge Language Models(LLM) and its application to generative question-answer ing. She has worked on multilingual instruction-tuning for Gemini and mult ilingual generative models used in Google search. Before Google DeepMind\, Cindy Wang obtained her PhD degree in Language Technologies at Carnegie M ellon University. During her PhD\, she mainly worked on developing data-ef ficient natural language processing~(NLP) systems. She has made several co ntributions in data selection\, data representation\, and model adaptation for multilingual NLP.
\n X-TAGS;LANGUAGE=en-US:2024\,March\,Wang END:VEVENT BEGIN:VEVENT UID:ai1ec-24481@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nNatural language provides an intuitive and powerful i nterface to access knowledge at scale. Modern language systems draw inform ation from two rich knowledge sources: (1) information stored in their par ameters during massive pretraining and (2) documents retrieved at inferenc e time. Yet\, we are far from building systems that can reliably provide i nformation from such knowledge sources. In this talk\, I will discuss path s for more robust systems. In the first part of the talk\, I will present a module for scaling retrieval-based knowledge augmentation. We learn a co mpressor that maps retrieved documents into textual summaries prior to in- context integration. This not only reduces the computational costs but als o filters irrelevant or incorrect information. In the second half of the t alk\, I will discuss the challenges of updating knowledge stored in model parameters and propose a method to prevent models from reciting outdated i nformation by identifying facts that are prone to rapid change. I will con clude my talk by proposing an interactive system that can elicit informati on from users when needed.\nBiography\nEunsol Choi is an assistant profess or in the Computer Science department at the University of Texas at Austin . Prior to UT\, she spent a year at Google AI as a visiting researcher. He r research area spans natural language processing and machine learning. Sh e is particularly interested in interpreting and reasoning about text in a dynamic real world context. She is a recipient of a Facebook research fel lowship\, Google faculty research award\, Sony faculty award\, and an outs tanding paper award at EMNLP. She received a Ph.D. in computer science and engineering from University of Washington and B.A in mathematics and comp uter science from Cornell University. DTSTART;TZID=America/New_York:20240315T120000 DTEND;TZID=America/New_York:20240315T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21209 SEQUENCE:0 SUMMARY:Eunsol Choi (University of Texas at Austin) “Knowledge-Rich Languag e Systems in a Dynamic World” URL:https://www.clsp.jhu.edu/events/eunsol-choi-university-of-texas-at-aust in-knowledge-rich-language-systems-in-a-dynamic-world/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nNatural language provides an intuitive and powerful i nterface to access knowledge at scale. Modern language systems draw inform ation from two rich knowledge sources: (1) information stored in their par ameters during massive pretraining and (2) documents retrieved at inferenc e time. Yet\, we are far from building systems that can reliably provide i nformation from such knowledge sources. In this talk\, I will discuss path s for more robust systems. In the first part of the talk\, I will present a module for scaling retrieval-based knowledge augmentation. We learn a co mpressor that maps retrieved documents into textual summaries prior to in- context integration. This not only reduces the computational costs but als o filters irrelevant or incorrect information. In the second half of the t alk\, I will discuss the challenges of updating knowledge stored in model parameters and propose a method to prevent models from reciting outdated i nformation by identifying facts that are prone to rapid change. I will con clude my talk by proposing an interactive system that can elicit informati on from users when needed.
\nBiography
\nEunsol Choi is an assistant professor in the Computer Scie nce department at the University of Texas at Austin. Prior to UT\, she spe nt a year at Google AI as a visiting researcher. Her research area spans n atural language processing and machine learning. She is particularly inter ested in interpreting and reasoning about text in a dynamic real world con text. She is a recipient of a Facebook research fellowship\, Google facult y research award\, Sony faculty award\, and an outstanding paper award at EMNLP. She received a Ph.D. in computer science and engineering from Unive rsity of Washington and B.A in mathematics and computer science from Corne ll University.
\n\n X-TAGS;LANGUAGE=en-US:2024\,Choi\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-24489@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20240329T120000 DTEND;TZID=America/New_York:20240329T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Soumi Maiti URL:https://www.clsp.jhu.edu/events/soumi-maiti/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2024\,Maiti\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-24491@www.clsp.jhu.edu DTSTAMP:20240319T040034Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20240401T120000 DTEND;TZID=America/New_York:20240401T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Yuan Gong URL:https://www.clsp.jhu.edu/events/yuan-gong/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2024\,April\,Gong END:VEVENT END:VCALENDAR