Seminars

Dec
9
Fri
Mark Hasegawa-Johnson (University of Illinois Urbana-Champaign) “Zipf’s Law Suggests a Three-Pronged Approach to Inclusive Speech Recognition” @ Hackerman Hall B17
Dec 9 @ 12:00 pm – 1:15 pm

Abstract

Zipf’s law is commonly glossed by the aphorism “infrequent words are frequent,” but in practice, it has often meant that there are three types of words: frequent, infrequent, and out-of-vocabulary (OOV). Speech recognition solved the problem of frequent words in 1970 (with dynamic time warping).  Hidden Markov models worked well for moderately infrequent words, but the problem of OOV words was not solved until sequence-to-sequence neural nets de-reified the concept of a word.  Many other social phenomena follow power-law distributions.  The number of native speakers of the N’th most spoken language, for example, is 1.44 billion over N to the 1.09.  In languages with sufficient data, we have shown that monolingual pre-training outperforms multilingual pre-training.  In less-frequent languages, multilingual knowledge transfer can significantly reduce phone error rates.  In languages with no training data, unsupervised ASR methods can be proven to converge, as long as the eigenvalues of the language model are sufficiently well separated to be measurable. Other systems of social categorization may follow similar power-law distributions.  Disability, for example, can cause speech patterns that were never seen in the training database, but not all disabilities need do so.  The inability of speech technology to work for people with even common disabilities is probably caused by a lack of data, and can probably be solved by finding better modes of interaction between technology researchers and the communities served by technology.

Biography

Mark Hasegawa-Johnson is a William L. Everitt Faculty Fellow of Electrical and Computer Engineering at the University of Illinois in Urbana-Champaign.  He has published research in speech production and perception, source separation, voice conversion, and low-resource automatic speech recognition.

Sep
1
Fri
Lei Li (Carnegie Mellon University) “Empowering Responsible Use of Large Language Models” @ Hackerman Hall B17
Sep 1 @ 12:00 pm – 1:15 pm

Abstract

Large language models (LLMs) have demonstrated incredible power, but they also possess vulnerabilities that can lead to misuse and potential attacks. In this presentation, we will address two fundamental questions regarding the responsible utilization of LLMs: (1) How can we accurately identify AI-generated text? (2) What measures can safeguard the intellectual property of LLMs? We will introduce two recent watermarking techniques designed for text and models, respectively. Our discussion will encompass the theoretical underpinnings that ensure the correctness of watermark detection, along with robustness against evasion attacks. Furthermore, we will showcase empirical evidence validating their effectiveness. These findings establish a solid technical groundwork for policymakers, legal professionals, and generative AI practitioners alike.

Biography

Lei Li is an Assistant Professor in Language Technology Institute at Carnegie Mellon University. He received Ph.D. from Carnegie Mellon University School of Computer Science. He is a recipient of ACL 2021 Best Paper Award, CCF Young Elite Award in 2019, CCF distinguished speaker in 2017, Wu Wen-tsün AI prize in 2017, and 2012 ACM SIGKDD dissertation award (runner-up), and is recognized as Notable Area Chair of ICLR 2023. Previously, he was a faculty member at UC Santa Barbara. Prior to that,  he founded ByteDance AI Lab in 2016 and led its research in NLP, ML, Robotics, and Drug Discovery. He launched ByteDance’s machine translation system VolcTrans and AI writing system Xiaomingbot, serving one billion users.

Center for Language and Speech Processing