BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-20987@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nWhile there is a vast amou nt of text written about nearly any topic\, this is often difficult for so meone unfamiliar with a specific field to understand. Automated text simpl ification aims to reduce the complexity of a document\, making it more com prehensible to a broader audience. Much of the research in this field has traditionally focused on simplification sub-tasks\, such as lexical\, synt actic\, or sentence-level simplification. However\, current systems strugg le to consistently produce high-quality simplifications. Phrase-based mode ls tend to make too many poor transformations\; on the other hand\, recent neural models\, while producing grammatical output\, often do not make al l needed changes to the original text. In this thesis\, I discuss novel ap proaches for improving lexical and sentence-level simplification systems. Regarding sentence simplification models\, after noting that encouraging d iversity at inference time leads to significant improvements\, I take a cl oser look at the idea of diversity and perform an exhaustive comparison of diverse decoding techniques on other generation tasks. I also discuss the limitations in the framing of current simplification tasks\, which preven t these models from yet being practically useful. Thus\, I also propose a retrieval-based reformulation of the problem. Specifically\, starting with a document\, I identify concepts critical to understanding its content\, and then retrieve documents relevant for each concept\, re-ranking them ba sed on the desired complexity level.
\nBiography
\nI’m a research scientist at the HLTCOE at Johns Hopkins University. My primary research interests are in language generati on\, diverse and constrained decoding\, and information retrieval. During my PhD I focused mainly on the task of text simplification\, and now am wo rking on formulating structured prediction problems as end-to-end generati on tasks. I received my PhD in July 2021 from the University of Pennsylvan ia with Chris Callison-Burch and Marianna Apidianaki.
\nDTSTART;TZID=America/New_York:20211022T120000 DTEND;TZID=America/New_York:20211022T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Reno Kriz (HLTCOE – JHU) “Towards a Practically Useful Text Simplif ication System” URL:https://www.clsp.jhu.edu/events/reno-kriz-hltcoe-jhu-towards-a-practica lly-useful-text-simplification-system/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2021\,Kriz\,October END:VEVENT BEGIN:VEVENT UID:ai1ec-21023@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nSpeech data is notoriously difficult to work with due to a variety of codecs\, length s of recordings\, and meta-data formats. We present Lhotse\, a speech data representation library that draws upon lessons learned from Kaldi speech recognition toolkit and brings its concepts into the modern deep learning ecosystem. Lhotse provides a common JSON description format with correspon ding Python classes and data preparation recipes for over 30 popular speec h corpora. Various datasets can be easily combined together and re-purpose d for different tasks. The library handles multi-channel recordings\, long recordings\, local and cloud storage\, lazy and on-the-fly operations amo ngst other features. We introduce Cut and CutSet concepts\, which simplify common data wrangling tasks for audio and help incorporate acoustic conte xt of speech utterances. Finally\, we show how Lhotse leverages PyTorch da ta API abstractions and adopts them to handle speech data for deep learnin g.
\nBiography
\nPiotr Zelasko is an a ssistant research scientist in the Center for Language and Speech Processi ng (CLSP) who specializes in automatic speech recognition (ASR) and spoken language understanding (SLU). His current research focuses on applying mu ltilingual and crosslingual speech recognition systems to categorize the p honetic inventory of a previously unknown language and on improving defens es against adversarial attacks on both speaker identification and automati c speech recognition systems. He is also addressing the question of how to structure a spontaneous conversation into high-level semantic units such as dialog acts or topics. Finally\, he is working on Lhotse + K2\, the nex t-generation speech processing research software ecosystem. Before joining Johns Hopkins\, Zelasko worked as a machine learning consultant for Avaya (2017-2019)\, and as a machine learning engineer for Techmo (2015-2017). Zelasko received his PhD (2019) in electronics engineering\, as well as hi s master’s (2014) and undergraduate degrees (2013) in acoustic engineering from AGH University of Science and Technology in Kraków\, Poland.
DTSTART;TZID=America/New_York:20211029T120000 DTEND;TZID=America/New_York:20211029T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore MD 21218 SEQUENCE:0 SUMMARY:Piotr Zelasko (CLSP at JHU) “Lhotse: a speech data representation l ibrary for the modern deep learning ecosystem” URL:https://www.clsp.jhu.edu/events/piotr-zelasko-clsp-at-jhu-lhotse-a-spee ch-data-representation-library-for-the-modern-deep-learning-ecosystem/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2021\,October\,Zelasko END:VEVENT BEGIN:VEVENT UID:ai1ec-21259@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nNatural language processin g has been revolutionized by neural networks\, which perform impressively well in applications such as machine translation and question answering. D espite their success\, neural networks still have some substantial shortco mings: Their internal workings are poorly understood\, and they are notori ously brittle\, failing on example types that are rare in their training d ata. In this talk\, I will use the unifying thread of hierarchical syntact ic structure to discuss approaches for addressing these shortcomings. Firs t\, I will argue for a new evaluation paradigm based on targeted\, hypothe sis-driven tests that better illuminate what models have learned\; using t his paradigm\, I will show that even state-of-the-art models sometimes fai l to recognize the hierarchical structure of language (e.g.\, to conclude that “The book on the table is blue” implies “The table is blue.”) Second\ , I will show how these behavioral failings can be explained through analy sis of models’ inductive biases and internal representations\, focusing on the puzzle of how neural networks represent discrete symbolic structure i n continuous vector space. I will close by showing how insights from these analyses can be used to make models more robust through approaches based on meta-learning\, structured architectures\, and data augmentation.
\nBiography
\nTom McCoy is a PhD candidate in the Department of Cognitive Science at Johns Hopkins University. As an undergr aduate\, he studied computational linguistics at Yale. His research combin es natural language processing\, cognitive science\, and machine learning to study how we can achieve robust generalization in models of language\, as this remains one of the main areas where current AI systems fall short. In particular\, he focuses on inductive biases and representations of lin guistic structure\, since these are two of the major components that deter mine how learners generalize to novel types of input.
DTSTART;TZID=America/New_York:20220131T120000 DTEND;TZID=America/New_York:20220131T131500 LOCATION:Ames Hall 234 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Tom McCoy (Johns Hopkins University) “Opening the Black Box of Deep Learning: Representations\, Inductive Biases\, and Robustness” URL:https://www.clsp.jhu.edu/events/tom-mccoy-johns-hopkins-university-open ing-the-black-box-of-deep-learning-representations-inductive-biases-and-ro bustness/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,January\,McCoy END:VEVENT BEGIN:VEVENT UID:ai1ec-21267@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nIn this talk\, I present a multipronged strategy for zero-shot cross-lingual Information Extraction\ , that is the construction of an IE model for some target language\, given existing annotations exclusively in some other language. This work is par t of the JHU team’s effort under the IARPA BETTER program. I explore data augmentation techniques including data projection and self-training\, and how different pretrained encoders impact them. We find through extensive e xperiments and extension of techniques that a combination of approaches\, both new and old\, leads to better performance than any one cross-lingual strategy in particular.
\nBiography
\nAbstract
\nSocial media allows resear chers to track societal and cultural changes over time based on language a nalysis tools. Many of these tools rely on statistical algorithms which ne ed to be tuned to specific types of language. Recent studies have question ed the robustness of longitudinal analyses based on statistical methods du e to issues of temporal bias and semantic shift. To what extent are change s in semantics over time affecting the reliability of longitudinal analyse s? We examine this question through a case study: understanding shifts in mental health during the course of the COVID-19 pandemic. We demonstrate t hat a recently-introduced method for measuring semantic shift may be used to proactively identify failure points of language-based models and improv e predictive generalization over time. Ultimately\, we find that these ana lyses are critical to producing accurate longitudinal studies of social me dia.
DTSTART;TZID=America/New_York:20220207T120000 DTEND;TZID=America/New_York:20220207T131500 LOCATION:In Person or Virtual Option @ https://wse.zoom.us/j/96735183473 @ 234 Ames Hall\, 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Student Seminar – Keith Harrigian “The Problem of Semantic Shift in Longitudinal Monitoring of Social Media: A Case Study on Mental Health d uring the COVID-19 Pandemic” URL:https://www.clsp.jhu.edu/events/student-seminar-keith-harrigian-the-pro blem-of-semantic-shift-in-longitudinal-monitoring-of-social-media-a-case-s tudy-on-mental-health-during-the-covid-19-pandemic/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,February\,Harrigian END:VEVENT BEGIN:VEVENT UID:ai1ec-21275@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract
\n\n\n\n\nAutomatic discovery of phon e or word-like units is one of the core objectives in zero-resource speech processing. Recent attempts employ contrastive predictive coding (CPC)\, where the model learns representations by predicting the next frame given past context. However\, CPC only looks at the audio signal’s structure at the frame level. The speech structure exists beyond frame-level\, i.e.\, a t phone level or even higher. We propose a segmental contrastive predictiv e coding (SCPC) framework to learn from the signal structure at both the f rame and phone levels.\n\n\nSCPC is a hierarchical model with three stages trained in an end-to-end m anner. In the first stage\, the model predicts future feature frames and e xtracts frame-level representation from the raw waveform. In the second st age\, a differentiable boundary detector finds variable-length segments. I n the last stage\, the model predicts future segments to learn segment rep resentations. Experiments show that our model outperforms existing phone a nd word segmentation methods on TIMIT and Buckeye datasets.
Abstract
\nAs humans\, our understand
ing of language is grounded in a rich mental model about “how the world wo
rks” – that we learn through perception and interaction. We use this under
standing to reason beyond what we literally observe or read\, imagining ho
w situations might unfold in the world. Machines today struggle at this ki
nd of reasoning\, which limits how they can communicate with humans.
In my talk\, I will discuss th
ree lines of work to bridge this gap between machines and humans. I will f
irst discuss how we might measure grounded understanding. I will introduce
a suite of approaches for constructing benchmarks\, using machines in the
loop to filter out spurious biases. Next\, I will introduce PIGLeT: a mod
el that learns physical commonsense understanding by interacting with the
world through simulation\, using this knowledge to ground language. From a
n English-language description of an event\, PIGLeT can anticipate how the
world state might change – outperforming text-only models that are orders
of magnitude larger. Finally\, I will introduce MERLOT\, which learns abo
ut situations in the world by watching millions of YouTube videos with tra
nscribed speech. Through training objectives inspired by the developmental
psychology idea of multimodal reentry\, MERLOT learns to fuse language\,
vision\, and sound together into powerful representations. Together\, these directions suggest a pa
th forward for building machines that learn language rooted in the world.<
/p>\n
Biography
\nRowan Zellers is a final year P hD candidate at the University of Washington in Computer Science & Enginee ring\, advised by Yejin Choi and Ali Farhadi. His research focuses on enab ling machines to understand language\, vision\, sound\, and the world beyo nd these modalities. He has been recognized through an NSF Graduate Fellow ship and a NeurIPS 2021 outstanding paper award. His work has appeared in several media outlets\, including Wired\, the Washington Post\, and the Ne w York Times. In the past\, he graduated from Harvey Mudd College with a B .S. in Computer Science & Mathematics\, and has interned at the Allen Inst itute for AI.
DTSTART;TZID=America/New_York:20220214T120000 DTEND;TZID=America/New_York:20220214T131500 LOCATION:Ames Hall 234 - Presented Virtually Via Zoom https://wse.zoom.us/j /96735183473 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Rowan Zellers (University of Washington) ” Grounding Language by Se eing\, Hearing\, and Interacting” URL:https://www.clsp.jhu.edu/events/rowan-zellers-university-of-washington- grounding-language-by-seeing-hearing-and-interacting/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,February\,Zellers END:VEVENT BEGIN:VEVENT UID:ai1ec-21280@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nAs AI-driven lan guage interfaces (such as chat-bots) become more integrated into our lives \, they need to become more versatile and reliable in their communication with human users. How can we make progress toward building more “general” models that are capable of understanding a broader spectrum of language co mmands\, given practical constraints such as the limited availability of l abeled data?
\nIn this talk\, I will describe my research toward addressing this question along two dimensions of generality. First I will discuss progress in “breadth” — models that address a wider variety of tasks and abilities\, drawing inspiration from existing statistical le arning techniques such as multi-task learning. In particular\, I will show case a system that works well on several QA benchmarks\, resulting in stat e-of-the-art results on 10 benchmarks. Furthermore\, I will show its exten sion to tasks beyond QA (such as text generation or classification) that c an be “defined” via natural language. In the second part\, I will focus o n progress in “depth” — models that can handle complex inputs such as comp ositional questions. I will introduce Text Modular Networks\, a general fr amework that casts problem-solving as natural language communication among simpler “modules.” Applying this framework to compositional questions by leveraging discrete optimization and existing non-compositional closed-box QA models results in a model with strong empirical performance on multipl e complex QA benchmarks while providing human-readable reasoning.
\nI will conclude with future research directions toward broader N LP systems by addressing the limitations of the presented ideas and other missing elements needed to move toward more general-purpose interactive la nguage understanding systems.
\nBiography
\nDaniel Khashabi is a postdoctoral researcher at the Al len Institute for Artificial Intelligence (AI2)\, Seattle. Previously\, he completed his Ph.D. in Computer and Information Sciences at the Universit y of Pennsylvania in 2019. His interests lie at the intersection of artifi cial intelligence and natural language processing\, with a vision toward m ore general systems through unified algorithms and theories.
DTSTART;TZID=America/New_York:20220218T120000 DTEND;TZID=America/New_York:20220218T131500 LOCATION:Ames Hall 234 - Presented Virtually Via Zoom https://wse.zoom.us/j /96735183473 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Daniel Khashabi (Allen Institute for Artificial Intelligence) “The Quest Toward Generality in Natural Language Understanding” URL:https://www.clsp.jhu.edu/events/daniel-khashabi-allen-institute-for-art ificial-intelligence/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,February\,Khashabi END:VEVENT BEGIN:VEVENT UID:ai1ec-21487@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nAbstract
\nSince it is increasingly h arder to opt out from interacting with AI technology\, people demand that AI is capable of maintaining contracts such that it supports agency and ov ersight of people who are required to use it or who are affected by it. To help those people create a mental model about how to interact with AI sys tems\, I extend the underlying models to self-explain—predict the label/an swer and explain this prediction. In this talk\, I will present how to gen erate (1) free-text explanations given in plain English that immediately t ell users the gist of the reasoning\, and (2) contrastive explanations tha t help users understand how they could change the text to get another labe l.
\nBiography
\nAna Marasović is a postdocto ral researcher at the Allen Institute for AI (AI2) and the Paul G. Allen S chool of Computer Science & Engineering at University of Washington. Her r esearch interests broadly lie in the fields of natural language processing \, explainable AI\, and vision-and-language learning. Her projects are mot ivated by a unified goal: improve interaction and control of the NLP syste ms to help people make these systems do what they want with the confidence that they’re getting exactly what they need. Prior to joining AI2\, Ana o btained her PhD from Heidelberg University.
\nHow to pronounce my name: the first name is Ana like in Spanish\, i.e.\, with a long “a” like in “water”\; regarding the last name: “mara” as in actress mara wilso n + “so” + “veetch”.
DTSTART;TZID=America/New_York:20220228T120000 DTEND;TZID=America/New_York:20220228T131500 LOCATION:Ames Hall 234 - Presented Virtually Via Zoom https://wse.zoom.us/j /96735183473 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Ana Marasović (Allen Institute for AI & University of Washington) “ Self-Explaining for Intuitive Interaction with AI” URL:https://www.clsp.jhu.edu/events/ana-marasovic-allen-institute-for-ai-un iversity-of-washington-self-explaining-for-intuitive-interaction-with-ai/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,February\,Marasovic END:VEVENT BEGIN:VEVENT UID:ai1ec-21494@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract
\nAdversarial atta cks deceive neural network systems by adding carefully crafted perturbatio ns to benign signals. Being almost imperceptible to humans\, these attacks pose a severe security threat to the state-of-the-art speech and speaker recognition systems\, making it vital to propose countermeasures against t hem. In this talk\, we focus on 1) classification of a given adversarial a ttack into attack algorithm type\, threat model type\, and signal-to-adver sarial-noise ratios\, 2) developing a novel speech denoising solution to f urther improve the classification performance.
\nO ur proposed approach uses an x-vector network as a signature extractor to get embeddings\, which we call signatures. These signatures contain inform ation about the attack and can help classify different attack algorithms\, threat models\, and signal-to-adversarial-noise ratios. We demonstrate th e transferability of such signatures to other tasks. In particular\, a sig nature extractor trained to classify attacks against speaker identificatio n can also be used to classify attacks against speaker verification and sp eech recognition. We also show that signatures can be used to detect unkno wn attacks i.e. attacks not included during training. Lastly\, we propose to improve the signature extractor by making the job of the signature ext ractor easier by removing the clean signal from the adversarial example (w hich consists of clean signal+perturbation). We train our signature extrac tor using adversarial perturbation. At inference time\, we use a time-doma in denoiser to obtain adversarial perturbation from adversarial examples. Using our improved approach\, we show that common attacks in the literatur e (Fast Gradient Sign Method (FGSM)\, Projected Gradient Descent (PGD)\, C arlini-Wagner (CW) ) can be classified with accuracy as high as 96%. We al so detect unknown attacks with an equal error rate (EER) of about 9%\, whi ch is very promising.
DTSTART;TZID=America/New_York:20220304T120000 DTEND;TZID=America/New_York:20220304T131500 LOCATION:Ames Hall 234 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Student Seminar – Sonal Joshi “Classify and Detect Adversarial Atta cks Against Speaker and Speech Recognition Systems” URL:https://www.clsp.jhu.edu/events/student-seminar-sonal-joshi/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,Joshi\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-21615@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract
\nDTSTART;TZID=America/New_York:20220311T120000 DTEND;TZID=America/New_York:20220311T131500 LOCATION:Virtual Seminar SEQUENCE:0 SUMMARY:Student Seminar – Anton Belyy “Systems for Human-AI Cooperation on Collecting Semantic Annotations” URL:https://www.clsp.jhu.edu/events/student-seminar-anton-belyy-systems-for -human-ai-cooperation-on-collecting-semantic-annotations/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,Belyy\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-21621@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nSystems that support expre ssive\, situated natural language interactions are essential for expanding access to complex computing systems\, such as robots and databases\, to n on-experts. Reasoning and learning in such natural language interactions i s a challenging open problem. For example\, resolving sentence meaning req uires reasoning not only about word meaning\, but also about the interacti on context\, including the history of the interaction and the situated env ironment. In addition\, the sequential dynamics that arise between user an d system in and across interactions make learning from static data\, i.e.\ , supervised data\, both challenging and ineffective. However\, these same interaction dynamics result in ample opportunities for learning from impl icit and explicit feedback that arises naturally in the interaction. This lays the foundation for systems that continually learn\, improve\, and ada pt their language use through interaction\, without additional annotation effort. In this talk\, I will focus on these challenges and opportunities. First\, I will describe our work on modeling dependencies between languag e meaning and interaction context when mapping natural language in interac tion to executable code. In the second part of the talk\, I will describe our work on language understanding and generation in collaborative interac tions\, focusing on continual learning from explicit and implicit user fee dback.
\nBiography
\nAlane Suhr is a PhD Cand idate in the Department of Computer Science at Cornell University\, advis ed by Yoav Artzi. Her research spans natural language processing\, machine learning\, and computer vision\, with a focus on building systems that pa rticipate and continually learn in situated natural language interactions with human users. Alane’s work has been recognized by paper awards at ACL and NAACL\, and has been supported by fellowships and grants\, including a n NSF Graduate Research Fellowship\, a Facebook PhD Fellowship\, and resea rch awards from AI2\, ParlAI\, and AWS. Alane has also co-organized multip le workshops and tutorials appearing at NeurIPS\, EMNLP\, NAACL\, and ACL. Previously\, Alane received a BS in Computer Science and Engineering as a n Eminence Fellow at the Ohio State University.
DTSTART;TZID=America/New_York:20220314T120000 DTEND;TZID=America/New_York:20220314T131500 LOCATION:Virtual Seminar SEQUENCE:0 SUMMARY:Alane Suhr (Cornell University) “Reasoning and Learning in Interact ive Natural Language Systems” URL:https://www.clsp.jhu.edu/events/alane-suhr-cornell-university-reasoning -and-learning-in-interactive-natural-language-systems/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,March\,Suhr END:VEVENT BEGIN:VEVENT UID:ai1ec-21616@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract
\nSocial media allows resear chers to track societal and cultural changes over time based on language a nalysis tools. Many of these tools rely on statistical algorithms which ne ed to be tuned to specific types of language. Recent studies have shown th e absence of appropriate tuning\, specifically in the presence of semantic shift\, can hinder robustness of the underlying methods. However\, little is known about the practical effect this sensitivity may have on downstre am longitudinal analyses. We explore this gap in the literature through a timely case study: understanding shifts in depression during the course of the COVID-19 pandemic. We find that inclusion of only a small number of s emantically-unstable features can promote significant changes in longitudi nal estimates of our target outcome. At the same time\, we demonstrate tha t a recently-introduced method for measuring semantic shift may be used to proactively identify failure points of language-based models and\, in tur n\, improve predictive generalization.
DTSTART;TZID=America/New_York:20220318T120000 DTEND;TZID=America/New_York:20220318T131500 LOCATION:Ames Hall 234 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Student Seminar – Keith Harrigian “The Problem of Semantic Shift in Longitudinal Monitoring of Social Media” URL:https://www.clsp.jhu.edu/events/student-seminar-keith-harrigian-the-pro blem-of-semantic-shift-in-longitudinal-monitoring-of-social-media/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,Harrigian\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-21497@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nWhile the “deep learning t sunami” continues to define the state of the art in speech and language pr ocessing\, finite-state transducer grammars developed by linguists and eng ineers are still widely used in industrial\, highly-multilingual settings\ , particularly for symbolic\, “front-end” speech applications. In this tal k\, I will first briefly review the current state of the OpenFst and OpenG rm finite-state transducer libraries. I then review two “late-breaking” al gorithms found in these libraries. The first is a heuristic but highly-eff ective general-purpose optimization routine for weighted transducers. The second is an algorithm for computing the single shortest string of non-det erministic weighted acceptors which lack certain properties required by cl assic shortest-path algorithms. I will then illustrate how the OpenGrm too ls can be used to induce a finite-state string-to-string transduction mode l known as a pair n-gram model. This model has been applied to grapheme-to -phoneme conversion\, loanword detection\, abbreviation expansion\, and ba ck-transliteration\, among other tasks.
\nBiography
\nKyle Gorman is an assistant professor of linguistics at the Gradu ate Center\, City University of New York\, and director of the master’s pr ogram in computational linguistics\; he is also a software engineer in the speech and language algorithms group at Google. With Richard Sproat\, he is the coauthor of Finite-State Text Processing (Morgan & Claypool\ , 2021) and the creator of Pynini\, a finite-state text processing library for Python. He has also published on statistical methods for comparing co mputational models\, text normalization\, grapheme-to-phoneme conversion\, and morphological analysis\, as well as many topics in linguistic theory.
DTSTART;TZID=America/New_York:20220401T120000 DTEND;TZID=America/New_York:20220401T131500 LOCATION:Ames Hall 234 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Kyle Gorman (City University of New York) ” Weighted Finite-State T ransducers: The Later Years” URL:https://www.clsp.jhu.edu/events/kyle-gorman-city-university-of-new-york -weighted-finite-state-transducers-the-later-years/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,Gorman\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-22374@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nIn recent years\, the fiel d of Natural Language Processing has seen a profusion of tasks\, datasets\ , and systems that facilitate reasoning about real-world situations throug h language (e.g.\, RTE\, MNLI\, COMET). Such systems might\, for example\, be trained to consider a situation where “somebody dropped a glass on the floor\,” and conclude it is likely that “the glass shattered” as a result . In this talk\, I will discuss three pieces of work that revisit assumpti ons made by or about these systems. In the first work\, I develop a Defeas ible Inference task\, which enables a system to recognize when a prior ass umption it has made may no longer be true in light of new evidence it rece ives. The second work I will discuss revisits partial-input baselines\, wh ich have highlighted issues of spurious correlations in natural language r easoning datasets and led to unfavorable assumptions about models’ reasoni ng abilities. In particular\, I will discuss experiments that show models may still learn to reason in the presence of spurious dataset artifacts. F inally\, I will touch on work analyzing harmful assumptions made by reason ing models in the form of social stereotypes\, particularly in the case of free-form generative reasoning models.
\nBiography
\nRachel Rudinger is an Assistant Professor in the Department of Co mputer Science at the University of Maryland\, College Park. She holds joi nt appointments in the Department of Linguistics and the Institute for Adv anced Computer Studies (UMIACS). In 2019\, Rachel completed her Ph.D. in C omputer Science at Johns Hopkins University in the Center for Language and Speech Processing. From 2019-2020\, she was a Young Investigator at the A llen Institute for AI in Seattle\, and a visiting researcher at the Univer sity of Washington. Her research interests include computational semantics \, common-sense reasoning\, and issues of social bias and fairness in NLP.
DTSTART;TZID=America/New_York:20220916T120000 DTEND;TZID=America/New_York:20220916T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Rachel Rudinger (University of Maryland\, College Park) “Not So Fas t!: Revisiting Assumptions in (and about) Natural Language Reasoning” URL:https://www.clsp.jhu.edu/events/rachel-rudinger-university-of-maryland- college-park-not-so-fast-revisiting-assumptions-in-and-about-natural-langu age-reasoning/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,Rudinger\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-22375@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nI will present our work on data augmentation using style transfer as a way to im prove domain adaptation in sequence labeling tasks. The target domain is s ocial media data\, and the task is named entity recognition (NER). The pre mise is that we can transform the labelled out of domain data into somethi ng that stylistically is more closely related to the target data. Then we can train a model on a combination of the generated data and the smaller a mount of in domain data to improve NER prediction performance. I will show recent empirical results on these efforts.
\nIf time allows\, I will also give an overview of other research projects I’m currently leading at RiTUAL (Research in Text Understanding and Analysis of Language) lab. The common thread among all these research problems is t he scarcity of labeled data.
\nBiography
\nThamar Solorio is a Professor of Com puter Science at the University of Houston (UH). She holds graduate degree s in Computer Science from the Instituto Nacional de Astrofísica\, Óptica y Electrónica\, in Puebla\, Mexico. Her research interests include informa tion extraction from social media data\, enabling technology for code-swit ched data\, stylistic modeling of text\, and more recently multimodal appr oaches for online content understanding. She is the director and founder o f the RiTUAL Lab at UH. She is the recipient of an NSF CAREER award for he r work on authorship attribution\, and recipient of the 2014 Emerging Lead er ABIE Award in Honor of Denice Denton. She is currently serving a second term as an elected board member of the North American Chapter of the Asso ciation of Computational Linguistics and was PC co-chair for NAACL 2019. S he recently joined the team of Editors in Chief for the ACL Rolling Review (ARR) system. Her research is currently funded by the NSF and by ADOBE. p> DTSTART;TZID=America/New_York:20220923T120000 DTEND;TZID=America/New_York:20220923T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Thamar Solorio (University of Houston) “Style Transfer for Data Aug mentation in Sequence Labeling Tasks” URL:https://www.clsp.jhu.edu/events/thamar-solorio-university-of-houston-st yle-transfer-for-data-augmentation-in-sequence-labeling-tasks/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,September\,Solorio END:VEVENT BEGIN:VEVENT UID:ai1ec-22380@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nThe availability of large multilingual pre-trained language models has opened up exciting pathways f or developing NLP technologies for languages with scarce resources. In thi s talk I will advocate for the need to go beyond the most common languages in multilingual evaluation\, and on the challenges of handling new\, unse en-during-training languages and varieties. I will also share some of my e xperiences with working with indigenous and other endangered language comm unities and activists.
\nBiography
\nAntonios Anastasopoulos is an As sistant Professor in Computer Science at George Mason University. In 2019\ , Antonis received his PhD in Computer Science from the University of Notr e Dame and then worked as a postdoctoral researcher at the Language Techno logies Institute at Carnegie Mellon University. His research interests rev olve around computational linguistics and natural language processing with a focus on low-resource settings\, endangered languages\, and cross-lingu al learning.
\nDTSTART;TZID=America/New_York:20220930T120000 DTEND;TZID=America/New_York:20220930T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Antonios Anastasopoulos (George Mason University) “NLP Beyond the T op-100 Languages” URL:https://www.clsp.jhu.edu/events/antonis-anastasopoulos-george-mason-uni versity/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,Anastasopoulos\,September END:VEVENT BEGIN:VEVENT UID:ai1ec-22423@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20221007T120000 DTEND;TZID=America/New_York:20221007T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Ariya Rastrow (Amazon) URL:https://www.clsp.jhu.edu/events/ariya-rastrow-amazon-2/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,October\,Rastrow END:VEVENT BEGIN:VEVENT UID:ai1ec-22394@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nModel robustness and spurious correlations have received increasing atten tion in the NLP community\, both in methods and evaluation. The term “spur ious correlation” is overloaded though and can refer to any undesirable sh ortcuts learned by the model\, as judged by domain experts.
\nWhen designing mitigation algorithms\, we oft en (implicitly) assume that a spurious feature is irrelevant for predictio n. However\, many features in NLP (e.g. word overlap and negation) are not spurious in the sense that the background is spurious for classifying obj ects in an image. In contrast\, they carry important information that’s ne eded to make predictions by humans. In this talk\, we argue that it is mor e productive to characterize features in terms of their necessity and suff iciency for prediction. We then discuss the implications of this categoriz ation in representation\, learning\, and evaluation.
\nBiogr aphy
\nHe He is an Assistant Professor in the Department of Computer Science and the Center for Data Science at New York University. She obtained her PhD in Computer Science at the University of Maryland\, C ollege Park. Before joining NYU\, she spent a year at AWS AI and was a pos t-doc at Stanford University before that. She is interested in building ro bust and trustworthy NLP systems in human-centered settings. Her recent re search focus includes robust language understanding\, collaborative text g eneration\, and understanding capabilities and issues of large language mo dels.
\n DTSTART;TZID=America/New_York:20221014T120000 DTEND;TZID=America/New_York:20221014T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:He He (New York University) “What We Talk about When We Talk about Spurious Correlations in NLP” URL:https://www.clsp.jhu.edu/events/he-he-new-york-university/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,He\,October END:VEVENT BEGIN:VEVENT UID:ai1ec-22395@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nAbstract
\nModern learning architectures for natural language processing have been very suc cessful in incorporating a huge amount of texts into their parameters. How ever\, by and large\, such models store and use knowledge in distributed a nd decentralized ways. This proves unreliable and makes the models ill-sui ted for knowledge-intensive tasks that require reasoning over factual info rmation in linguistic expressions. In this talk\, I will give a few examp les of exploring alternative architectures to tackle those challenges. In particular\, we can improve the performance of such (language) models by r epresenting\, storing and accessing knowledge in a dedicated memory compon ent.
\nThis talk is based on several joint works with Yury Zemlyanskiy (Google Research)\, Michiel de Jong (USC and Google Research)\, William Cohen (Google Research and CMU) and our other collabo rators in Google Research.
\nBiography
\nFei is a research scientist at Google Research. Before that\, he was a Profess or of Computer Science at University of Southern California. His primary r esearch interests are machine learning and its application to various AI p roblems: speech and language processing\, computer vision\, robotics and r ecently weather forecast and climate modeling. He has a PhD (2007) from Computer and Information Science from U. of Pennsylvania and B.Sc and M.Sc in Biomedical Engineering from Southeast University (Nanjing\, China).
DTSTART;TZID=America/New_York:20221024T120000 DTEND;TZID=America/New_York:20221024T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Fei Sha (University of Southern California) “Extracting Information from Text into Memory for Knowledge-Intensive Tasks” URL:https://www.clsp.jhu.edu/events/fei-sha-university-of-southern-californ ia/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,October\,Sha END:VEVENT BEGIN:VEVENT UID:ai1ec-22403@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nVoice conversion (VC) is a significant aspect of artificial intelligence. It is the study of how to convert one’s voice to sound like that of another without changing the lin guistic content. Voice conversion belongs to a general technical field of speech synthesis\, which converts text to speech or changes the properties of speech\, for example\, voice identity\, emotion\, and accents. Voice c onversion involves multiple speech processing techniques\, such as speech analysis\, spectral conversion\, prosody conversion\, speaker characteriza tion\, and vocoding. With the recent advances in theory and practice\, we are now able to produce human-like voice quality with high speaker similar ity. In this talk\, Dr. Sisman will present the recent advances in voice c onversion and discuss their promise and limitations. Dr. Sisman will also provide a summary of the available resources for expressive voice conversi on research.
\nBiography
\nDr. Berrak Sisman (Member\, IEEE) received the Ph.D. degree in electrical and computer engin eering from National University of Singapore in 2020\, fully funded by A*S TAR Graduate Academy under Singapore International Graduate Award (SINGA). She is currently working as a tenure-track Assistant Professor at the Eri k Jonsson School Department of Electrical and Computer Engineering at Univ ersity of Texas at Dallas\, United States. Prior to joining UT Dallas\, sh e was a faculty member at Singapore University of Technology and Design (2 020-2022). She was a Postdoctoral Research Fellow at the National Universi ty of Singapore (2019-2020). She was an exchange doctoral student at the U niversity of Edinburgh and a visiting scholar at The Centre for Speech Tec hnology Research (CSTR)\, University of Edinburgh (2019). She was a visiti ng researcher at RIKEN Advanced Intelligence Project in Japan (2018). Her research is focused on machine learning\, signal processing\, emotion\, sp eech synthesis and voice conversion.
\nDr. Sisman has served as the Area Chair at INTERSPEECH 2021\, INTERSPEECH 2022\, IEEE SLT 2022 and as t he Publication Chair at ICASSP 2022. She has been elected as a member of t he IEEE Speech and Language Processing Technical Committee (SLTC) in the a rea of Speech Synthesis for the term from January 2022 to December 2024. S he plays leadership roles in conference organizations and active in techni cal committees. She has served as the General Coordinator of the Student A dvisory Committee (SAC) of International Speech Communication Association (ISCA).
DTSTART;TZID=America/New_York:20221104T120000 DTEND;TZID=America/New_York:20221104T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Berrak Sisman (University of Texas at Dallas) “Speech Synthesis and Voice Conversion: Machine Learning can Mimic Anyone’s Voice” URL:https://www.clsp.jhu.edu/events/berrak-sisman-university-of-texas-at-da llas/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,November\,Sisman END:VEVENT BEGIN:VEVENT UID:ai1ec-22408@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nAbstract
\nDriven by the goal of erad icating language barriers on a global scale\, machine translation has soli dified itself as a key focus of artificial intelligence research today. Ho wever\, such efforts have coalesced around a small subset of languages\, l eaving behind the vast majority of mostly low-resource languages. What doe s it take to break the 200 language barrier while ensuring safe\, high-qua lity results\, all while keeping ethical considerations in mind? In this t alk\, I introduce No Language Left Behind\, an initiative to break languag e barriers for low-resource languages. In No Language Left Behind\, we too k on the low-resource language translation challenge by first contextualiz ing the need for translation support through exploratory interviews with n ative speakers. Then\, we created datasets and models aimed at narrowing t he performance gap between low and high-resource languages. We proposed mu ltiple architectural and training improvements to counteract overfitting w hile training on thousands of tasks. Critically\, we evaluated the perform ance of over 40\,000 different translation directions using a human-transl ated benchmark\, Flores-200\, and combined human evaluation with a novel t oxicity benchmark covering all languages in Flores-200 to assess translati on safety. Our model achieves an improvement of 44% BLEU relative to the p revious state-of-the-art\, laying important groundwork towards realizing a universal translation system in an open-source manner.
\nBi ography
\nAngela is a research scientis t at Meta AI Research in New York\, focusing on supporting efforts in spee ch and language research. Recent projects include No Language Left Behind (https://ai.facebook.com/r esearch/no-language-left-behind/) and Universal Speech Translation for Unwritten Languages (https://ai.faceb ook.com/blog/ai-translation-hokkien/). Before translation\, Angela pre viously focused on research in on-device models for NLP and computer visio n and text generation.
\nDTSTART;TZID=America/New_York:20221118T120000 DTEND;TZID=America/New_York:20221118T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Angela Fan (Meta AI Research) “No Language Left Behind: Scaling Hu man-Centered Machine Translation” URL:https://www.clsp.jhu.edu/events/angela-fan-facebook/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,Fan\,November END:VEVENT BEGIN:VEVENT UID:ai1ec-22417@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nOne of the keys to success in machine learning applications is to improve each user’s personal exper ience via personalized models. A personalized model can be a more resource -efficient solution than a general-purpose model\, too\, because it focuse s on a particular sub-problem\, for which a smaller model architecture can be good enough. However\, training a personalized model requires data fro m the particular test-time user\, which are not always available due to th eir private nature and technical challenges. Furthermore\, such data tend to be unlabeled as they can be collected only during the test time\, once after the system is deployed to user devices. One could rely on the genera lization power of a generic model\, but such a model can be too computatio nally/spatially complex for real-time processing in a resource-constrained device. In this talk\, I will present som e techniques to circumvent the lack of labeled personal data in the contex t of speech enhancement. Our machine learning models will require zero or few data samples from the test-time users\, while they can still achieve t he personalization goal. To this end\, we will investigate modularized spe ech enhancement models as well as the potential of self-supervised learnin g for personalized speech enhancement. Because our research achieves the p ersonalization goal in a data- and resource-efficient way\, it is a step t owards a more available and affordable AI for society.
\nBio graphy
\nMinje Kim is an associate professor in the Dept. of Intellig ent Systems Engineering at Indiana University\, where he leads his researc h group\, Signals and AI Group in Engineering (SAIGE). He is also an Amazo n Visiting Academic\, consulting for Amazon Lab126. At IU\, he is affiliat ed with various programs and labs such as Data Science\, Cognitive Science \, Dept. of Statistics\, and Center for Machine Learning. He earned his Ph .D. in the Dept. of Computer Science at the University of Illinois at Urba na-Champaign. Before joining UIUC\, He worked as a researcher at ETRI\, a national lab in Korea\, from 2006 to 2011. Before then\, he received his M aster’s and Bachelor’s degrees in the Dept. of Computer Science and Engine ering at POSTECH (Summa Cum Laude) and in the Division of Information and Computer Engineering at Ajou University (w ith honor) in 2006 and 2004\, respectively. He is a recipient of various a wards including NSF Career Award (2021)\, IU Trustees Teaching Award (2021 )\, IEEE SPS Best Paper Award (2020)\, and Google and Starkey’s grants for outstanding student papers in ICASSP 2013 and 2014\, respectively. He is an IEEE Senior Member and also a member of the IEEE Audio and Acoustic Sig nal Processing Technical Committee (2018-2023). He is serving as an Associ ate Editor for EURASIP Journal of Audio\, Speech\, and Music Processing\, and as a Consulting Associate Editor for IEEE Open Journal of Signal Proce ssing. He is also a reviewer\, program committee member\, or area chair fo r the major machine learning and signal processing. He filed more than 50 patent applications as an inventor.
DTSTART;TZID=America/New_York:20221202T120000 DTEND;TZID=America/New_York:20221202T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Minje Kim (Indiana University) “Personalized Speech Enhancement: Da ta- and Resource-Efficient Machine Learning” URL:https://www.clsp.jhu.edu/events/minje-kim-indiana-university/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,December\,Kim END:VEVENT BEGIN:VEVENT UID:ai1ec-22422@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nZipf’s law is commonly glo ssed by the aphorism “infrequent words are frequent\,” but in practice\, i t has often meant that there are three types of words: frequent\, infreque nt\, and out-of-vocabulary (OOV). Speech recognition solved the problem of frequent words in 1970 (with dynamic time warping). Hidden Markov models worked well for moderately infrequent words\, but the problem of OOV word s was not solved until sequence-to-sequence neural nets de-reified the con cept of a word. Many other social phenomena follow power-law distribution s. The number of native speakers of the N’th most spoken language\, for e xample\, is 1.44 billion over N to the 1.09. In languages with sufficient data\, we have shown that monolingual pre-training outperforms multilingu al pre-training. In less-frequent languages\, multilingual knowledge tran sfer can significantly reduce phone error rates. In languages with no tra ining data\, unsupervised ASR methods can be proven to converge\, as long as the eigenvalues of the language model are sufficiently well separated t o be measurable. Other systems of social categorization may follow similar power-law distributions. Disability\, for example\, can cause speech pat terns that were never seen in the training database\, but not all disabili ties need do so. The inability of speech technology to work for people wi th even common disabilities is probably caused by a lack of data\, and can probably be solved by finding better modes of interaction between technol ogy researchers and the communities served by technology.
\nBiography
\nMark Hasegawa-Johnson is a William L. Everitt F aculty Fellow of Electrical and Computer Engineering at the University of Illinois in Urbana-Champaign. He has published research in speech product ion and perception\, source separation\, voice conversion\, and low-resour ce automatic speech recognition.
DTSTART;TZID=America/New_York:20221209T120000 DTEND;TZID=America/New_York:20221209T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Mark Hasegawa-Johnson (University of Illinois Urbana-Champaign) “Zi pf’s Law Suggests a Three-Pronged Approach to Inclusive Speech Recognition ” URL:https://www.clsp.jhu.edu/events/mark-hasegawa-johnson-university-of-ill inois-urbana-champaign/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,December\,Hasegawa-Johnson END:VEVENT BEGIN:VEVENT UID:ai1ec-23900@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20231002T120000 DTEND;TZID=America/New_York:20231002T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:CLSP Student Seminar – Anna Favaro URL:https://www.clsp.jhu.edu/events/clsp-student-seminar-anna-favaro/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,Favaro\,October END:VEVENT BEGIN:VEVENT UID:ai1ec-24115@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract
\nOur goal is to use AI to a utomatically find tax minimization strategies\, an approach which we call “Shelter Check.” It would come in two variants. Existing-Authority Shelter Check would aim to find whether existing tax law authorities can be combi ned to create tax minimization strategies\, so the IRS or Congress can shu t them down. New-Authority Shelter Check would automate checking whether a new tax law authority – like proposed legislation or a draft court decisi on – would combine with existing authorities to create a new tax minimizat ion strategy. We had initially had high hopes for GPT-* large language mod els for implementing Shelter Check\, but our tests have showed that they d o very poorly at basic legal reasoning and handling legal text. So we are now creating a benchmark and training data for LLM’s handling legal text\, hoping to spur improvements.
DTSTART;TZID=America/New_York:20231006T120000 DTEND;TZID=America/New_York:20231006T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:CLSP Student Seminar – Andrew Blair-Stanek “Shelter Check and GPT-4 ’s Bad Legal Performance” URL:https://www.clsp.jhu.edu/events/clsp-student-seminar-andrew-blair-stane k-shelter-check-and-gpt-4s-bad-legal-performance/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,Blair-Stanek\,October END:VEVENT BEGIN:VEVENT UID:ai1ec-24005@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nLarge-scale generative models such as GPT and DALL-E have r evolutionized natural language processing and computer vision research. Th ese models not only generate high fidelity text or image outputs\, but als o demonstrate impressive domain and task generalization capabilities. In c ontrast\, audio generative models are relatively primitive in scale and ge neralization.
\nIn this talk\, I will start with a brief introduction on conventional n eural speech generative models and discuss why they are unfit for scaling to Internet-scale data. Next\, by reviewing the latest large-scale generat ive models for text and image\, I will outline a few lines of promising ap proaches to build scalable speech models. Last\, I will present Voicebox\, our latest work to advance this area. Voicebox is the most versatile gene rative model for speech. It is trained with a simple task — text condition ed speech infilling — on over 50K hours of multilingual speech with a powe rful flow-matching objective. Through in-context learning\, Voicebox can p erform monolingual/cross-lingual zero-shot TTS\, holistic style conversion \, transient noise removal\, content editing\, and diverse sample generati on. Moreover\, Voicebox achieves state-of-the-art performance and excellen t run-time efficiency.
\nBiography
\nWei-Ning Hsu is a resear ch scientist at Meta Foundational AI Research (FAIR) and currently the lea d of the audio generation team. His research focuses on self-supervised le arning and generative models for speech and audio. His pioneering work inc ludes HuBERT\, AV-HuBERT\, TextlessNLP\, data2vec\, wav2vec-U\, textless s peech translation\, and Voicebox.
\nPrior to joining Meta\, Wei-Ning worked at MERL an d Google Brain as a research intern. He received his Ph.D. and S.M. degree s in Electrical Engineering and Computer Science from Massachusetts Instit ute of Technology in 2020 and 2018\, under the supervision of Dr. James Gl ass. He received his B.S. degree in Electrical Engineering from National T aiwan University in 2014\, under the supervision of Prof. Lin-shan Lee and Prof. Hsuan-Tien Lin.
DTSTART;TZID=America/New_York:20231009T120000 DTEND;TZID=America/New_York:20231009T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Wei-Ning Hsu (Meta Foundational AI Research) “Large Scale Universal Speech Generative Models” URL:https://www.clsp.jhu.edu/events/wei-ning-hsu-meta-foundational-ai-resea rch-large-scale-universal-speech-generative-models/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,Hsu\,October END:VEVENT BEGIN:VEVENT UID:ai1ec-23902@www.clsp.jhu.edu DTSTAMP:20240328T100844Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nAbstract
\nRecent advances in speech technology make heavy use of pre-trained models that learn from large quan tities of raw (untranscribed) speech\, using “self-supervised” (ie unsuper vised) learning. These models learn to transform the acoustic input into a different representational format that makes supervised learning (for tas ks such as transcription or even translation) much easier. However\, *what * and *how* speech-relevant information is encoded in these representation s is not well understood. I will talk about some work at various stages of completion in which my group is analyzing the structure of these represen tations\, to gain a more systematic understanding of how word-level\, phon etic\, and speaker information is encoded.
\nBiography
\nSharon Goldwater is a Professor in the Institute for Language\ , Cognition and Computation at the University of Edinburgh’s School of Inf ormatics. She received her PhD in 2007 from Brown University and spent two years as a postdoctoral researcher at Stanford University before moving t o Edinburgh. Her research interests include unsupervised and minimally-sup ervised learning for speech and language processing\, computer modelling o f language acquisition in children\, and computational studies of language use. Her main focus within linguistics has been on the lower levels of s tructure including phonetics\, phonology\, and morphology.
DTSTART;TZID=America/New_York:20231027T120000 DTEND;TZID=America/New_York:20231027T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Sharon Goldwater (University of Edinburgh) “Analyzing Representatio ns of Self-Supervised Speech Models” URL:https://www.clsp.jhu.edu/events/sharon-goldwater-university-of-edinburg h/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,Goldwater\,October END:VEVENT END:VCALENDAR Prof. Goldwater has received awards incl uding the 2016 Roger Needham Award from the British Computer Society for “ distinguished research contribution in computer science by a UK-based rese archer who has completed up to 10 years of post-doctoral research.” She ha s served on the editorial boards of several journals\, including Computati onal Linguistics\, Transactions of the Association for Computational Lingu istics\, and the inaugural board of OPEN MIND: Advances in Cognitive Scien ce. She was a program chair for the EACL 2014 Conference and chaired the E ACL governing board from 2019-2020.