Seminars

Mar
3
Fri
John Hansen (University of Texas at Dallas) “Challenges and Advancements in Speaker Diarization & Recognition for Naturalistic Data Streams” @ Hackerman Hall B17
Mar 3 @ 12:00 pm – 1:15 pm

Abstract

Speech communications represents a core domain for education, team problem solving, social engagement, and business interactions. The ability for Speech Technology to extract layers of knowledge and assess engagement content represents the next generation of advanced speech solutions. Today, the emergence of BIG DATA, Machine Learning, as well as voice enabled speech systems have required the need for effective voice capture and automatic speech/speaker recognition. The ability to employ speech and language technology to assess human-to-human interactions offers new research paradigms having profound impact on assessing human interaction. In this talk, we will focus on big data naturalistic audio processing relating to (i) child learning spaces, and (ii) the NASA APOLLO lunar missions. ML based technology advancements include automatic audio diarization, speech recognition, and speaker recognition. Child-Teacher based assessment of conversational interactions are explored, including keyword and “WH-word” (e.g., who, what, etc.). Diarization processing solutions are applied to both classroom/learning space child speech, as well as massive APOLLO data. CRSS-UTDallas is expanding our original Apollo-11 corpus, resulting in a massive multi-track audio processing challenge to make available 150,000hrs of Apollo mission data to be shared with science communities: (i) speech/language technology, (ii) STEM/science and team-based researchers, and (iii) education/historical/archiving specialists. Our goals here are to provide resources which allow to better understand how people work/learn collaboratively together. For Apollo, to accomplish one of mankind’s greatest scientific/technological challenges in the last century.

Biography

John H.L. Hansen, received Ph.D. & M.S. degrees from Georgia Institute of Technology, and B.S.E.E. from Rutgers Univ. He joined Univ. of Texas at Dallas (UTDallas) in 2005, where he currently serves as Associate Dean for Research, Prof. of ECE, Distinguished Univ. Chair in Telecom. Engineering, and directs Center for Robust Speech Systems (CRSS). He is an ISCA Fellow, IEEE Fellow, and has served as Member and TC-Chair of IEEE Signal Proc. Society, Speech & Language Proc. Tech. Comm.(SLTC), and Technical Advisor to U.S. Delegate for NATO (IST/TG-01). He served as ISCA President (2017-21), continues to serve on ISCA Board (2015-23) as Treasurer, has supervised 99 PhD/MS thesis candidates (EE,CE,BME,TE,CS,Ling.,Cog.Sci.,Spch.Sci.,Hear.Sci), was recipient of 2020 UT-Dallas Provost’s Award for Grad. PhD Research Mentoring; author/co-author of 865 journal/conference papers including 14 textbooks in the field of speech/language/hearing processing & technology including coauthor of textbook Discrete-Time Processing of Speech Signals, (IEEE Press, 2000), and lead author of the report “The Impact of Speech Under ‘Stress’ on Military Speech Technology,” (NATO RTO-TR-10, 2000). He served as Organizer, Chair/Co-Chair/Tech.Chair for ISCA INTERSPEECH-2022, IEEE ICASSP-2010, IEEE SLT-2014, ISCA INTERSPEECH-2002, and Tech. Chair for IEEE ICASSP-2024. He received the 2022 IEEE Signal Processing Society Leo Beranek MERITORIOUS SERVICE Award.

 

Mar
10
Fri
Denise DiPersio (Linguistic Data Consortium, University of Pennsylvania) “Data and Ethics: Where Does the Twain Meet?” @ Hackerman Hall B17
Mar 10 @ 12:00 pm – 1:15 pm

Abstract

As data-based technologies proliferate, it is increasingly important for researchers to be aware of their work’s wider impact. Concerns like navigating the IRB and figuring out copyright and licensing issues are still key, but the current focus shift to matters like inclusivity, fairness, and transparency and their impact on the research/development life cycle have added complexity to the research task. In this talk, we will take a broad look at the various ways ethics intersects with natural language processing, machine learning, and artificial intelligence research and discuss strategies and resources for managing these concerns within the broader research framework.

Biography

Denise is responsible for the overall operation of LDC’s External Relations group which includes intellectual property management, licensing, regulatory matters, publications, membership and communications. Before joining LDC, she practiced law for over 20 years in the areas of international trade, intellectual property and commercial litigation. She has an A.B. in Political Science from Bryn Mawr College and a Juris Doctor degree from the University of Miami School of Law.

Mar
17
Fri
Alessandra Cervone (Amazon) “Controllable Text Generation for Creative Applications @ Hackerman Hall B17
Mar 17 @ 12:00 pm – 1:15 pm

Abstract

Recent advances in large pretrained language models have unlocked new exciting applications for Natural Language Generation for creative tasks, such as lyrics or humour generation. In this talk we will discuss recent works by our team at Alexa AI and discuss current challenges: (1) Pun understanding and generation: We release new datasets for pun understanding and the novel task of context-situated pun generation, and demonstrate the value of our annotations for pun classification and generation tasks. (2) Song lyric generation: we design a hierarchical lyric generation framework that enables us to generate pleasantly-singable lyrics without training on melody-lyric aligned data, and show that our approach is competitive with strong baselines supervised on parallel data. (3) Create with Alexa: a multimodal story creation experience recently launched on Alexa devices, which leverages story text generation models in tandem with story visualization and background music generation models to produce multimodal stories for kids.

Biography

Alessandra Cervone is an Applied Scientist in the Natural Understanding team at Amazon Alexa AI. Alessandra holds an MSc in Speech and Language Processing from University of Edinburgh and a PhD in CS from University of Trento (Italy). During her PhD, Alessandra worked on computational models of coherence in open-domain dialogue advised by Giuseppe Riccardi. In the first year of the PhD, she was the team leader of one of the teams selected to compete in the first edition of the Alexa Prize. More recently, her research interests have been focused on natural language generation and its evaluation, in particular in the context of creative AI applications.

Mar
27
Mon
Student Seminar – Desh Raj @ Hackerman Hall B17
Mar 27 @ 12:00 pm – 1:15 pm
Mar
31
Fri
Emily Prud’hommeaux (Boston College) “Endangered or Just Under-Resourced? Evaluating ASR Quality and Utility When Data is Scarce” @ Hackerman Hall B17
Mar 31 @ 12:00 pm – 1:15 pm

Abstract

Despite many recent advances in automatic speech recognition (ASR), linguists and language communities engaged in language documentation projects continue to face the obstacle of the “transcription bottleneck”. Researchers in NLP typically do not distinguish between widely spoken languages that currently happen to have few training resources and endangered languages that will never have abundant data. As a result, we often fail to thoroughly explore when ASR is helpful for language documentation, what architectures work best for the sorts of languages that are in need of documentation, and how data can be collected and organized to produce optimal results. In this talk I describe several projects that attempt to bridge the gap between the promise of ASR for language documentation and the reality of using this technology in real-world settings.

Biography

Emily Prud’hommeaux is the Gianinno Family Sesquicentennial Assistant Professor in the Department of Computer Science at Boston College. She received her BA (Harvard) and MA (University of California, Los Angeles) in Linguistics, and her PhD in Computer Science and Engineering (OHSU/OGI). Her research area is natural language processing in low-resource settings, with a particular focus on endangered languages and the language of individuals with conditions impacting communication and cognition.
Apr
3
Mon
Student Seminar – Samik Sadhu (JHU) “Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives” @ Hackerman Hall B17
Apr 3 @ 12:00 pm – 1:15 pm

Abstract

How important are different temporal speech modulations for speech recognition? We answer this question from two complementary perspectives. Firstly, we quantify the amount of phonetic information in the modulation spectrum of speech by computing the mutual information between temporal modulations with frame-wise phoneme labels. Looking from another perspective, we ask – which speech modulations an Automatic Speech Recognition (ASR) system prefers for its operation. Data-driven weights are learned over the modulation spectrum and optimized for an end-to-end ASR task. Both methods unanimously agree that speech information is mostly contained in slow modulation. Maximum mutual information occurs around 3-6 Hz which also happens to be the range of modulations most preferred by the ASR. In addition, we show that the incorporation of this knowledge into ASRs significantly reduces their dependency on the amount of training data.

 

Apr
7
Fri
JHU CLSP APSA Roundtable on Learning How to Play with the Machines @ Hackerman Hall B17
Apr 7 @ 12:00 pm – 1:15 pm

Learning How to Play With The Machines: Taking Stock of Where the Collaboration Between Computational and Social Science Stands

 

Speakers:  Jeff Gill, Ernesto Calvo, Hale Sirin and Antonios Anastasopoulos

Apr
10
Mon
Student Seminar – Ruizhe Huang @ Hackerman Hall B17
Apr 10 @ 12:00 pm – 1:15 pm
Apr
14
Fri
Larry Heck (Georgia Institute of Technology) “The AVA Digital Human: Improving Conversational Interactions through Visually Situated Context” @ Hackerman Hall B17
Apr 14 @ 12:00 pm – 1:15 pm

Abstract

Advances in open domain Large Language Models (LLMs) starting with BERT and more recently with GPT-4, PaLM, and LLaMA have facilitated dramatic improvements in conversational systems. These improvements include an unprecedented breadth of conversational interactions between humans and machines while maintaining and sometimes surpassing the accuracy of systems trained specifically for known, closed domains. However, many applications still require higher levels of accuracy than pre-trained LLMs can provide. There are many studies underway to accomplish this. Broadly speaking, the methods assume the pre-trained models are fixed (due to cost/time), and instead look to various augmentation methods including prompting strategies and model adaptation/fine-tuning.

One augmentation strategy leverages the context of the conversation. For example, who are the participants and what is known about these individuals (personal context), what was just said (dialogue context), where is the conversation taking place (geo context), what time of day and season is it (time context), etc.  A powerful form of context is the shared visual setting of the conversation between the human(s) and machine. The shared visual scene may be from a device (phone, smart glasses) or represented on a screen (browser, maps, etc.) The elements in the visual context can be exploited by grounding the natural language conversational interaction, thereby changing the priors of certain concepts and increasing the accuracy of the system. In this talk, I will present some of my historical work in this area as well as my recent work in the AI Virtual Assistant (AVA) Lab at Georgia Tech.

Bio

Dr. Larry Heck is a Professor with a joint appointment in the School of Electrical and Computer Engineering and the School of Interactive Computing at the Georgia Institute of Technology. He holds the Rhesa S. Farmer Distinguished Chair of Advanced Computing Concepts and is a Georgia Research Alliance Eminent Scholar. His received the BSEE from Texas Tech University (1986), and MSEE and PhD EE from the Georgia Institute of Technology (1989,1991). He is a Fellow of the IEEE, inducted into the Academy of Distinguished Engineering Alumni at Georgia Tech and received the Distinguished Engineer Award from the Texas Tech University Whitacre College of Engineering. He was a Senior Research Engineer with SRI (1992-98), Vice President of R&D at Nuance (1998-2005), Vice President of Search and Advertising Sciences at Yahoo! (2005-2009), Chief Scientist of the Microsoft Speech products and Distinguished Engineer in Microsoft Research (2009-2014), Principal Scientist with Google Research (2014-2017), and CEO of Viv Labs and SVP at Samsung (2017-2021).

 

Apr
17
Mon
Paco Guzman (Meta AI) “Building a Universal Translation System to Break Down Language Barriers” @ Hackerman Hall B17
Apr 17 @ 12:00 pm – 1:15 pm

Abstract

Machine Translation has the ultimate goal of eliminating language barriers. However, the area has focused mainly on a few languages, leaving many low-resource languages without support. In this talk, I will discuss the challenges of bringing translation support for 200 written languages and beyond.
First, I talk about the No Language Left Behind Project, where we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. We proposed multiple architectural and training improvements to counteract over-fitting while training on thousands of language-pairs/tasks. We evaluated the performance of over 40,000 different translation directions.
Afterwards, I’ll discuss the challenges of pushing translation performance beyond text for languages that don’t have written standards like Hokkien.

Our models achieve state-of-the-art performance and lay important groundwork towards realizing a universal translation system. At the same time, we keep making open-source contributions for everyone to keep advancing the research for the languages they care about.

Bio

Paco is Research Scientist Manager supporting translation teams in Meta AI (FAIR). He works in the field of machine translation with a focus on low-resource translation (e.g. NLLB, FLORES) and the aim to break language barriers. He joined Meta in 2016. His research has been published in top-tier NLP venues like ACL, EMNLP. He was the co-chair of the Research director at AMTA (2020-2022). He has ave organized several research competitions focused on low-resource translation and data filtering. Paco obtained his PhD from the ITESM in Mexico, was a visiting scholar at the LTI-CMU from 2008-2009, and participated in DARPA’s GALE evaluation program. Paco was a post-doc and scientist at Qatar Computing Research Institute in Qatar in 2012-2016

Center for Language and Speech Processing