John Hansen (University of Texas at Dallas) “Challenges and Advancements in Speaker Diarization & Recognition for Naturalistic Data Streams” @ Hackerman Hall B17
Mar 3 @ 12:00 pm – 1:15 pm


Speech communications represents a core domain for education, team problem solving, social engagement, and business interactions. The ability for Speech Technology to extract layers of knowledge and assess engagement content represents the next generation of advanced speech solutions. Today, the emergence of BIG DATA, Machine Learning, as well as voice enabled speech systems have required the need for effective voice capture and automatic speech/speaker recognition. The ability to employ speech and language technology to assess human-to-human interactions offers new research paradigms having profound impact on assessing human interaction. In this talk, we will focus on big data naturalistic audio processing relating to (i) child learning spaces, and (ii) the NASA APOLLO lunar missions. ML based technology advancements include automatic audio diarization, speech recognition, and speaker recognition. Child-Teacher based assessment of conversational interactions are explored, including keyword and “WH-word” (e.g., who, what, etc.). Diarization processing solutions are applied to both classroom/learning space child speech, as well as massive APOLLO data. CRSS-UTDallas is expanding our original Apollo-11 corpus, resulting in a massive multi-track audio processing challenge to make available 150,000hrs of Apollo mission data to be shared with science communities: (i) speech/language technology, (ii) STEM/science and team-based researchers, and (iii) education/historical/archiving specialists. Our goals here are to provide resources which allow to better understand how people work/learn collaboratively together. For Apollo, to accomplish one of mankind’s greatest scientific/technological challenges in the last century.


John H.L. Hansen, received Ph.D. & M.S. degrees from Georgia Institute of Technology, and B.S.E.E. from Rutgers Univ. He joined Univ. of Texas at Dallas (UTDallas) in 2005, where he currently serves as Associate Dean for Research, Prof. of ECE, Distinguished Univ. Chair in Telecom. Engineering, and directs Center for Robust Speech Systems (CRSS). He is an ISCA Fellow, IEEE Fellow, and has served as Member and TC-Chair of IEEE Signal Proc. Society, Speech & Language Proc. Tech. Comm.(SLTC), and Technical Advisor to U.S. Delegate for NATO (IST/TG-01). He served as ISCA President (2017-21), continues to serve on ISCA Board (2015-23) as Treasurer, has supervised 99 PhD/MS thesis candidates (EE,CE,BME,TE,CS,Ling.,Cog.Sci.,Spch.Sci.,Hear.Sci), was recipient of 2020 UT-Dallas Provost’s Award for Grad. PhD Research Mentoring; author/co-author of 865 journal/conference papers including 14 textbooks in the field of speech/language/hearing processing & technology including coauthor of textbook Discrete-Time Processing of Speech Signals, (IEEE Press, 2000), and lead author of the report “The Impact of Speech Under ‘Stress’ on Military Speech Technology,” (NATO RTO-TR-10, 2000). He served as Organizer, Chair/Co-Chair/Tech.Chair for ISCA INTERSPEECH-2022, IEEE ICASSP-2010, IEEE SLT-2014, ISCA INTERSPEECH-2002, and Tech. Chair for IEEE ICASSP-2024. He received the 2022 IEEE Signal Processing Society Leo Beranek MERITORIOUS SERVICE Award.


Paco Guzman (Meta AI) “Building a Universal Translation System to Break Down Language Barriers” @ Hackerman Hall B17
Apr 17 @ 12:00 pm – 1:15 pm


Machine Translation has the ultimate goal of eliminating language barriers. However, the area has focused mainly on a few languages, leaving many low-resource languages without support. In this talk, I will discuss the challenges of bringing translation support for 200 written languages and beyond.
First, I talk about the No Language Left Behind Project, where we took on this challenge by first contextualizing the need for low-resource language translation support through exploratory interviews with native speakers. Then, we created datasets and models aimed at narrowing the performance gap between low and high-resource languages. We proposed multiple architectural and training improvements to counteract over-fitting while training on thousands of language-pairs/tasks. We evaluated the performance of over 40,000 different translation directions.
Afterwards, I’ll discuss the challenges of pushing translation performance beyond text for languages that don’t have written standards like Hokkien.

Our models achieve state-of-the-art performance and lay important groundwork towards realizing a universal translation system. At the same time, we keep making open-source contributions for everyone to keep advancing the research for the languages they care about.


Paco is Research Scientist Manager supporting translation teams in Meta AI (FAIR). He works in the field of machine translation with a focus on low-resource translation (e.g. NLLB, FLORES) and the aim to break language barriers. He joined Meta in 2016. His research has been published in top-tier NLP venues like ACL, EMNLP. He was the co-chair of the Research director at AMTA (2020-2022). He has ave organized several research competitions focused on low-resource translation and data filtering. Paco obtained his PhD from the ITESM in Mexico, was a visiting scholar at the LTI-CMU from 2008-2009, and participated in DARPA’s GALE evaluation program. Paco was a post-doc and scientist at Qatar Computing Research Institute in Qatar in 2012-2016

Center for Language and Speech Processing