Abstract
While GPT models have shown impressive performance on summarization and open-ended text generation, it’s important to assess their abilities on more constrained text generation tasks that require significant and diverse rewritings. In this talk, I will discuss the challenges of evaluating systems that are highly competitive and perform close to humans on two such tasks: (i) paraphrase generation and (ii) text simplification. To address these challenges, we introduce an interactive Rank-and-Rate evaluation framework. Our results show that GPT-3.5 has made a major step up from fine-tuned T5 in paraphrase generation, but still lacks the diversity and creativity of humans who spontaneously produce large quantities of paraphrases.
Additionally, we demonstrate that GPT-3.5 performs similarly to a single human in text simplification, which makes it difficult for existing automatic evaluation metrics to distinguish between the two. To overcome this shortcoming, we propose LENS, a learnable evaluation metric that outperforms SARI, BERTScore, and other existing methods in both automatic evaluation and minimal risk decoding for text generation.
Biography
Wei Xu is an assistant professor in the School of Interactive Computing at the Georgia Institute of Technology, where she is also affiliated with the new NSF AI CARING Institute and Machine Learning Center. She received her Ph.D. in Computer Science from New York University and her B.S. and M.S. from Tsinghua University. Xu’s research interests are in natural language processing, machine learning, and social media, with a focus on text generation, stylistics, robustness and controllability of machine learning models, and reading and writing assistive technology. She is a recipient of the NSF CAREER Award, CrowdFlower AI for Everyone Award, Criteo Faculty Research Award, and Best Paper Award at COLING’18. She has also received funds from DARPA and IARPA. She is an elected member of the NAACL executive board and regularly serves as a senior area chair for AI/NLP conferences.
Abstract
Our goal is to use AI to automatically find tax minimization strategies, an approach which we call “Shelter Check.” It would come in two variants. Existing-Authority Shelter Check would aim to find whether existing tax law authorities can be combined to create tax minimization strategies, so the IRS or Congress can shut them down. New-Authority Shelter Check would automate checking whether a new tax law authority – like proposed legislation or a draft court decision – would combine with existing authorities to create a new tax minimization strategy. We had initially had high hopes for GPT-* large language models for implementing Shelter Check, but our tests have showed that they do very poorly at basic legal reasoning and handling legal text. So we are now creating a benchmark and training data for LLM’s handling legal text, hoping to spur improvements.
Abstract
Large-scale generative models such as GPT and DALL-E have revolutionized natural language processing and computer vision research. These models not only generate high fidelity text or image outputs, but also demonstrate impressive domain and task generalization capabilities. In contrast, audio generative models are relatively primitive in scale and generalization.
In this talk, I will start with a brief introduction on conventional neural speech generative models and discuss why they are unfit for scaling to Internet-scale data. Next, by reviewing the latest large-scale generative models for text and image, I will outline a few lines of promising approaches to build scalable speech models. Last, I will present Voicebox, our latest work to advance this area. Voicebox is the most versatile generative model for speech. It is trained with a simple task — text conditioned speech infilling — on over 50K hours of multilingual speech with a powerful flow-matching objective. Through in-context learning, Voicebox can perform monolingual/cross-lingual zero-shot TTS, holistic style conversion, transient noise removal, content editing, and diverse sample generation. Moreover, Voicebox achieves state-of-the-art performance and excellent run-time efficiency.
Biography
Wei-Ning Hsu is a research scientist at Meta Foundational AI Research (FAIR) and currently the lead of the audio generation team. His research focuses on self-supervised learning and generative models for speech and audio. His pioneering work includes HuBERT, AV-HuBERT, TextlessNLP, data2vec, wav2vec-U, textless speech translation, and Voicebox.
Prior to joining Meta, Wei-Ning worked at MERL and Google Brain as a research intern. He received his Ph.D. and S.M. degrees in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2020 and 2018, under the supervision of Dr. James Glass. He received his B.S. degree in Electrical Engineering from National Taiwan University in 2014, under the supervision of Prof. Lin-shan Lee and Prof. Hsuan-Tien Lin.
Abstract
Abstract
Recent advances in speech technology make heavy use of pre-trained models that learn from large quantities of raw (untranscribed) speech, using “self-supervised” (ie unsupervised) learning. These models learn to transform the acoustic input into a different representational format that makes supervised learning (for tasks such as transcription or even translation) much easier. However, *what* and *how* speech-relevant information is encoded in these representations is not well understood. I will talk about some work at various stages of completion in which my group is analyzing the structure of these representations, to gain a more systematic understanding of how word-level, phonetic, and speaker information is encoded.
Biography
Sharon Goldwater is a Professor in the Institute for Language, Cognition and Computation at the University of Edinburgh’s School of Informatics. She received her PhD in 2007 from Brown University and spent two years as a postdoctoral researcher at Stanford University before moving to Edinburgh. Her research interests include unsupervised and minimally-supervised learning for speech and language processing, computer modelling of language acquisition in children, and computational studies of language use. Her main focus within linguistics has been on the lower levels of structure including phonetics, phonology, and morphology.
Prof. Goldwater has received awards including the 2016 Roger Needham Award from the British Computer Society for “distinguished research contribution in computer science by a UK-based researcher who has completed up to 10 years of post-doctoral research.” She has served on the editorial boards of several journals, including Computational Linguistics, Transactions of the Association for Computational Linguistics, and the inaugural board of OPEN MIND: Advances in Cognitive Science. She was a program chair for the EACL 2014 Conference and chaired the EACL governing board from 2019-2020.