Seminars

Nov
4
Fri
Berrak Sisman (University of Texas at Dallas) “Speech Synthesis and Voice Conversion: Machine Learning can Mimic Anyone’s Voice” @ Hackerman Hall B17
Nov 4 @ 12:00 pm – 1:15 pm

Abstract

Voice conversion (VC) is a significant aspect of artificial intelligence. It is the study of how to convert one’s voice to sound like that of another without changing the linguistic content. Voice conversion belongs to a general technical field of speech synthesis, which converts text to speech or changes the properties of speech, for example, voice identity, emotion, and accents. Voice conversion involves multiple speech processing techniques, such as speech analysis, spectral conversion, prosody conversion, speaker characterization, and vocoding. With the recent advances in theory and practice, we are now able to produce human-like voice quality with high speaker similarity. In this talk, Dr. Sisman will present the recent advances in voice conversion and discuss their promise and limitations. Dr. Sisman will also provide a summary of the available resources for expressive voice conversion research.

Biography

Dr. Berrak Sisman (Member, IEEE) received the Ph.D. degree in electrical and computer engineering from National University of Singapore in 2020, fully funded by A*STAR Graduate Academy under Singapore International Graduate Award (SINGA). She is currently working as a tenure-track Assistant Professor at the Erik Jonsson School Department of Electrical and Computer Engineering at University of Texas at Dallas, United States. Prior to joining UT Dallas, she was a faculty member at Singapore University of Technology and Design (2020-2022). She was a Postdoctoral Research Fellow at the National University of Singapore (2019-2020). She was an exchange doctoral student at the University of Edinburgh and a visiting scholar at The Centre for Speech Technology Research (CSTR), University of Edinburgh (2019). She was a visiting researcher at RIKEN Advanced Intelligence Project in Japan (2018). Her research is focused on machine learning, signal processing, emotion, speech synthesis and voice conversion.

Dr. Sisman has served as the Area Chair at INTERSPEECH 2021, INTERSPEECH 2022, IEEE SLT 2022 and as the Publication Chair at ICASSP 2022. She has been elected as a member of the IEEE Speech and Language Processing Technical Committee (SLTC) in the area of Speech Synthesis for the term from January 2022 to December 2024. She plays leadership roles in conference organizations and active in technical committees. She has served as the General Coordinator of the Student Advisory Committee (SAC) of International Speech Communication Association (ISCA).

Mar
3
Fri
John Hansen (University of Texas at Dallas) “Challenges and Advancements in Speaker Diarization & Recognition for Naturalistic Data Streams” @ Hackerman Hall B17
Mar 3 @ 12:00 pm – 1:15 pm

Abstract

Speech communications represents a core domain for education, team problem solving, social engagement, and business interactions. The ability for Speech Technology to extract layers of knowledge and assess engagement content represents the next generation of advanced speech solutions. Today, the emergence of BIG DATA, Machine Learning, as well as voice enabled speech systems have required the need for effective voice capture and automatic speech/speaker recognition. The ability to employ speech and language technology to assess human-to-human interactions offers new research paradigms having profound impact on assessing human interaction. In this talk, we will focus on big data naturalistic audio processing relating to (i) child learning spaces, and (ii) the NASA APOLLO lunar missions. ML based technology advancements include automatic audio diarization, speech recognition, and speaker recognition. Child-Teacher based assessment of conversational interactions are explored, including keyword and “WH-word” (e.g., who, what, etc.). Diarization processing solutions are applied to both classroom/learning space child speech, as well as massive APOLLO data. CRSS-UTDallas is expanding our original Apollo-11 corpus, resulting in a massive multi-track audio processing challenge to make available 150,000hrs of Apollo mission data to be shared with science communities: (i) speech/language technology, (ii) STEM/science and team-based researchers, and (iii) education/historical/archiving specialists. Our goals here are to provide resources which allow to better understand how people work/learn collaboratively together. For Apollo, to accomplish one of mankind’s greatest scientific/technological challenges in the last century.

Biography

John H.L. Hansen, received Ph.D. & M.S. degrees from Georgia Institute of Technology, and B.S.E.E. from Rutgers Univ. He joined Univ. of Texas at Dallas (UTDallas) in 2005, where he currently serves as Associate Dean for Research, Prof. of ECE, Distinguished Univ. Chair in Telecom. Engineering, and directs Center for Robust Speech Systems (CRSS). He is an ISCA Fellow, IEEE Fellow, and has served as Member and TC-Chair of IEEE Signal Proc. Society, Speech & Language Proc. Tech. Comm.(SLTC), and Technical Advisor to U.S. Delegate for NATO (IST/TG-01). He served as ISCA President (2017-21), continues to serve on ISCA Board (2015-23) as Treasurer, has supervised 99 PhD/MS thesis candidates (EE,CE,BME,TE,CS,Ling.,Cog.Sci.,Spch.Sci.,Hear.Sci), was recipient of 2020 UT-Dallas Provost’s Award for Grad. PhD Research Mentoring; author/co-author of 865 journal/conference papers including 14 textbooks in the field of speech/language/hearing processing & technology including coauthor of textbook Discrete-Time Processing of Speech Signals, (IEEE Press, 2000), and lead author of the report “The Impact of Speech Under ‘Stress’ on Military Speech Technology,” (NATO RTO-TR-10, 2000). He served as Organizer, Chair/Co-Chair/Tech.Chair for ISCA INTERSPEECH-2022, IEEE ICASSP-2010, IEEE SLT-2014, ISCA INTERSPEECH-2002, and Tech. Chair for IEEE ICASSP-2024. He received the 2022 IEEE Signal Processing Society Leo Beranek MERITORIOUS SERVICE Award.

 

Mar
10
Fri
Denise DiPersio (Linguistic Data Consortium, University of Pennsylvania) “Data and Ethics: Where Does the Twain Meet?” @ Hackerman Hall B17
Mar 10 @ 12:00 pm – 1:15 pm

Abstract

As data-based technologies proliferate, it is increasingly important for researchers to be aware of their work’s wider impact. Concerns like navigating the IRB and figuring out copyright and licensing issues are still key, but the current focus shift to matters like inclusivity, fairness, and transparency and their impact on the research/development life cycle have added complexity to the research task. In this talk, we will take a broad look at the various ways ethics intersects with natural language processing, machine learning, and artificial intelligence research and discuss strategies and resources for managing these concerns within the broader research framework.

Biography

Denise is responsible for the overall operation of LDC’s External Relations group which includes intellectual property management, licensing, regulatory matters, publications, membership and communications. Before joining LDC, she practiced law for over 20 years in the areas of international trade, intellectual property and commercial litigation. She has an A.B. in Political Science from Bryn Mawr College and a Juris Doctor degree from the University of Miami School of Law.

Mar
17
Fri
Alessandra Cervone (Amazon) “Controllable Text Generation for Creative Applications @ Hackerman Hall B17
Mar 17 @ 12:00 pm – 1:15 pm

Abstract

Recent advances in large pretrained language models have unlocked new exciting applications for Natural Language Generation for creative tasks, such as lyrics or humour generation. In this talk we will discuss recent works by our team at Alexa AI and discuss current challenges: (1) Pun understanding and generation: We release new datasets for pun understanding and the novel task of context-situated pun generation, and demonstrate the value of our annotations for pun classification and generation tasks. (2) Song lyric generation: we design a hierarchical lyric generation framework that enables us to generate pleasantly-singable lyrics without training on melody-lyric aligned data, and show that our approach is competitive with strong baselines supervised on parallel data. (3) Create with Alexa: a multimodal story creation experience recently launched on Alexa devices, which leverages story text generation models in tandem with story visualization and background music generation models to produce multimodal stories for kids.

Biography

Alessandra Cervone is an Applied Scientist in the Natural Understanding team at Amazon Alexa AI. Alessandra holds an MSc in Speech and Language Processing from University of Edinburgh and a PhD in CS from University of Trento (Italy). During her PhD, Alessandra worked on computational models of coherence in open-domain dialogue advised by Giuseppe Riccardi. In the first year of the PhD, she was the team leader of one of the teams selected to compete in the first edition of the Alexa Prize. More recently, her research interests have been focused on natural language generation and its evaluation, in particular in the context of creative AI applications.

Mar
27
Mon
Student Seminar – Desh Raj @ Hackerman Hall B17
Mar 27 @ 12:00 pm – 1:15 pm
Mar
31
Fri
Emily Prud’hommeaux (Boston College) “Endangered or Just Under-Resourced? Evaluating ASR Quality and Utility When Data is Scarce” @ Hackerman Hall B17
Mar 31 @ 12:00 pm – 1:15 pm

Abstract

Despite many recent advances in automatic speech recognition (ASR), linguists and language communities engaged in language documentation projects continue to face the obstacle of the “transcription bottleneck”. Researchers in NLP typically do not distinguish between widely spoken languages that currently happen to have few training resources and endangered languages that will never have abundant data. As a result, we often fail to thoroughly explore when ASR is helpful for language documentation, what architectures work best for the sorts of languages that are in need of documentation, and how data can be collected and organized to produce optimal results. In this talk I describe several projects that attempt to bridge the gap between the promise of ASR for language documentation and the reality of using this technology in real-world settings.

Biography

Emily Prud’hommeaux is the Gianinno Family Sesquicentennial Assistant Professor in the Department of Computer Science at Boston College. She received her BA (Harvard) and MA (University of California, Los Angeles) in Linguistics, and her PhD in Computer Science and Engineering (OHSU/OGI). Her research area is natural language processing in low-resource settings, with a particular focus on endangered languages and the language of individuals with conditions impacting communication and cognition.

Center for Language and Speech Processing