John Hansen (University of Texas at Dallas) “Challenges and Advancements in Speaker Diarization & Recognition for Naturalistic Data Streams”
3400 N. Charles Street
Speech communications represents a core domain for education, team problem solving, social engagement, and business interactions. The ability for Speech Technology to extract layers of knowledge and assess engagement content represents the next generation of advanced speech solutions. Today, the emergence of BIG DATA, Machine Learning, as well as voice enabled speech systems have required the need for effective voice capture and automatic speech/speaker recognition. The ability to employ speech and language technology to assess human-to-human interactions offers new research paradigms having profound impact on assessing human interaction. In this talk, we will focus on big data naturalistic audio processing relating to (i) child learning spaces, and (ii) the NASA APOLLO lunar missions. ML based technology advancements include automatic audio diarization, speech recognition, and speaker recognition. Child-Teacher based assessment of conversational interactions are explored, including keyword and “WH-word” (e.g., who, what, etc.). Diarization processing solutions are applied to both classroom/learning space child speech, as well as massive APOLLO data. CRSS-UTDallas is expanding our original Apollo-11 corpus, resulting in a massive multi-track audio processing challenge to make available 150,000hrs of Apollo mission data to be shared with science communities: (i) speech/language technology, (ii) STEM/science and team-based researchers, and (iii) education/historical/archiving specialists. Our goals here are to provide resources which allow to better understand how people work/learn collaboratively together. For Apollo, to accomplish one of mankind’s greatest scientific/technological challenges in the last century.
John H.L. Hansen, received Ph.D. & M.S. degrees from Georgia Institute of Technology, and B.S.E.E. from Rutgers Univ. He joined Univ. of Texas at Dallas (UTDallas) in 2005, where he currently serves as Associate Dean for Research, Prof. of ECE, Distinguished Univ. Chair in Telecom. Engineering, and directs Center for Robust Speech Systems (CRSS). He is an ISCA Fellow, IEEE Fellow, and has served as Member and TC-Chair of IEEE Signal Proc. Society, Speech & Language Proc. Tech. Comm.(SLTC), and Technical Advisor to U.S. Delegate for NATO (IST/TG-01). He served as ISCA President (2017-21), continues to serve on ISCA Board (2015-23) as Treasurer, has supervised 99 PhD/MS thesis candidates (EE,CE,BME,TE,CS,Ling.,Cog.Sci.,Spch.Sci.,Hear.Sci), was recipient of 2020 UT-Dallas Provost’s Award for Grad. PhD Research Mentoring; author/co-author of 865 journal/conference papers including 14 textbooks in the field of speech/language/hearing processing & technology including coauthor of textbook Discrete-Time Processing of Speech Signals, (IEEE Press, 2000), and lead author of the report “The Impact of Speech Under ‘Stress’ on Military Speech Technology,” (NATO RTO-TR-10, 2000). He served as Organizer, Chair/Co-Chair/Tech.Chair for ISCA INTERSPEECH-2022, IEEE ICASSP-2010, IEEE SLT-2014, ISCA INTERSPEECH-2002, and Tech. Chair for IEEE ICASSP-2024. He received the 2022 IEEE Signal Processing Society Leo Beranek MERITORIOUS SERVICE Award.