Seminars

Mar
3
Fri
John Hansen (University of Texas at Dallas) “Challenges and Advancements in Speaker Diarization & Recognition for Naturalistic Data Streams” @ Hackerman Hall B17
Mar 3 @ 12:00 pm – 1:15 pm

Abstract

Speech communications represents a core domain for education, team problem solving, social engagement, and business interactions. The ability for Speech Technology to extract layers of knowledge and assess engagement content represents the next generation of advanced speech solutions. Today, the emergence of BIG DATA, Machine Learning, as well as voice enabled speech systems have required the need for effective voice capture and automatic speech/speaker recognition. The ability to employ speech and language technology to assess human-to-human interactions offers new research paradigms having profound impact on assessing human interaction. In this talk, we will focus on big data naturalistic audio processing relating to (i) child learning spaces, and (ii) the NASA APOLLO lunar missions. ML based technology advancements include automatic audio diarization, speech recognition, and speaker recognition. Child-Teacher based assessment of conversational interactions are explored, including keyword and “WH-word” (e.g., who, what, etc.). Diarization processing solutions are applied to both classroom/learning space child speech, as well as massive APOLLO data. CRSS-UTDallas is expanding our original Apollo-11 corpus, resulting in a massive multi-track audio processing challenge to make available 150,000hrs of Apollo mission data to be shared with science communities: (i) speech/language technology, (ii) STEM/science and team-based researchers, and (iii) education/historical/archiving specialists. Our goals here are to provide resources which allow to better understand how people work/learn collaboratively together. For Apollo, to accomplish one of mankind’s greatest scientific/technological challenges in the last century.

Biography

John H.L. Hansen, received Ph.D. & M.S. degrees from Georgia Institute of Technology, and B.S.E.E. from Rutgers Univ. He joined Univ. of Texas at Dallas (UTDallas) in 2005, where he currently serves as Associate Dean for Research, Prof. of ECE, Distinguished Univ. Chair in Telecom. Engineering, and directs Center for Robust Speech Systems (CRSS). He is an ISCA Fellow, IEEE Fellow, and has served as Member and TC-Chair of IEEE Signal Proc. Society, Speech & Language Proc. Tech. Comm.(SLTC), and Technical Advisor to U.S. Delegate for NATO (IST/TG-01). He served as ISCA President (2017-21), continues to serve on ISCA Board (2015-23) as Treasurer, has supervised 99 PhD/MS thesis candidates (EE,CE,BME,TE,CS,Ling.,Cog.Sci.,Spch.Sci.,Hear.Sci), was recipient of 2020 UT-Dallas Provost’s Award for Grad. PhD Research Mentoring; author/co-author of 865 journal/conference papers including 14 textbooks in the field of speech/language/hearing processing & technology including coauthor of textbook Discrete-Time Processing of Speech Signals, (IEEE Press, 2000), and lead author of the report “The Impact of Speech Under ‘Stress’ on Military Speech Technology,” (NATO RTO-TR-10, 2000). He served as Organizer, Chair/Co-Chair/Tech.Chair for ISCA INTERSPEECH-2022, IEEE ICASSP-2010, IEEE SLT-2014, ISCA INTERSPEECH-2002, and Tech. Chair for IEEE ICASSP-2024. He received the 2022 IEEE Signal Processing Society Leo Beranek MERITORIOUS SERVICE Award.

 

Mar
10
Fri
Denise DiPersio (Linguistic Data Consortium, University of Pennsylvania) “Data and Ethics: Where Does the Twain Meet?” @ Hackerman Hall B17
Mar 10 @ 12:00 pm – 1:15 pm

Abstract

As data-based technologies proliferate, it is increasingly important for researchers to be aware of their work’s wider impact. Concerns like navigating the IRB and figuring out copyright and licensing issues are still key, but the current focus shift to matters like inclusivity, fairness, and transparency and their impact on the research/development life cycle have added complexity to the research task. In this talk, we will take a broad look at the various ways ethics intersects with natural language processing, machine learning, and artificial intelligence research and discuss strategies and resources for managing these concerns within the broader research framework.

Biography

Denise is responsible for the overall operation of LDC’s External Relations group which includes intellectual property management, licensing, regulatory matters, publications, membership and communications. Before joining LDC, she practiced law for over 20 years in the areas of international trade, intellectual property and commercial litigation. She has an A.B. in Political Science from Bryn Mawr College and a Juris Doctor degree from the University of Miami School of Law.

Mar
17
Fri
Alessandra Cervone (Amazon) “Controllable Text Generation for Creative Applications @ Hackerman Hall B17
Mar 17 @ 12:00 pm – 1:15 pm

Abstract

Recent advances in large pretrained language models have unlocked new exciting applications for Natural Language Generation for creative tasks, such as lyrics or humour generation. In this talk we will discuss recent works by our team at Alexa AI and discuss current challenges: (1) Pun understanding and generation: We release new datasets for pun understanding and the novel task of context-situated pun generation, and demonstrate the value of our annotations for pun classification and generation tasks. (2) Song lyric generation: we design a hierarchical lyric generation framework that enables us to generate pleasantly-singable lyrics without training on melody-lyric aligned data, and show that our approach is competitive with strong baselines supervised on parallel data. (3) Create with Alexa: a multimodal story creation experience recently launched on Alexa devices, which leverages story text generation models in tandem with story visualization and background music generation models to produce multimodal stories for kids.

Biography

Alessandra Cervone is an Applied Scientist in the Natural Understanding team at Amazon Alexa AI. Alessandra holds an MSc in Speech and Language Processing from University of Edinburgh and a PhD in CS from University of Trento (Italy). During her PhD, Alessandra worked on computational models of coherence in open-domain dialogue advised by Giuseppe Riccardi. In the first year of the PhD, she was the team leader of one of the teams selected to compete in the first edition of the Alexa Prize. More recently, her research interests have been focused on natural language generation and its evaluation, in particular in the context of creative AI applications.

Mar
27
Mon
Student Seminar – Desh Raj @ Hackerman Hall B17
Mar 27 @ 12:00 pm – 1:15 pm
Mar
31
Fri
Emily Prud’hommeaux (Boston College) “Endangered or Just Under-Resourced? Evaluating ASR Quality and Utility When Data is Scarce” @ Hackerman Hall B17
Mar 31 @ 12:00 pm – 1:15 pm

Abstract

Despite many recent advances in automatic speech recognition (ASR), linguists and language communities engaged in language documentation projects continue to face the obstacle of the “transcription bottleneck”. Researchers in NLP typically do not distinguish between widely spoken languages that currently happen to have few training resources and endangered languages that will never have abundant data. As a result, we often fail to thoroughly explore when ASR is helpful for language documentation, what architectures work best for the sorts of languages that are in need of documentation, and how data can be collected and organized to produce optimal results. In this talk I describe several projects that attempt to bridge the gap between the promise of ASR for language documentation and the reality of using this technology in real-world settings.

Biography

Emily Prud’hommeaux is the Gianinno Family Sesquicentennial Assistant Professor in the Department of Computer Science at Boston College. She received her BA (Harvard) and MA (University of California, Los Angeles) in Linguistics, and her PhD in Computer Science and Engineering (OHSU/OGI). Her research area is natural language processing in low-resource settings, with a particular focus on endangered languages and the language of individuals with conditions impacting communication and cognition.

Center for Language and Speech Processing