Abstract
Speech can easily be manipulated through techniques, such as text-to-speech synthesis, voice conversion, replay, tampering, adversarial attacks, and more. However, when the manipulation is applied only to a minor portion of an audio, the remaining real segments can have a dominant influence upon human listeners and make machine detection extremely challenging. Therefore, there is an urgent need to explore such a scenario, where synthetic speech is embedded within otherwise real audio. The primary objective of this webinar is to review research efforts aimed at defending against such partially fake audio with focus on relevant databases, explainable analyses, and three core tasks (spoof detection, localization, and diarization).
Bio
Lin Zhang received the M.S. degree from Tianjin University, Tianjin, China, in 2020, and the Ph.D. degree from the Graduate University for Advanced Studies / National Institute of Informatics, Tokyo, Japan in 2024. She is currently a Postdoctoral Fellow at the Center for Language and Speech Processing, Johns Hopkins University, USA. She has also visited and/or worked at Brno University of Technology and Duke Kunshan University. Her research interests include speech security and privacy, speech production, as well as machine learning.
Abstract
As customer expectations rise and legacy automation reaches its limits, Comcast is redefining digital care through an agentic AI transformation. This talk chronicles our evolution from scripted bots to modular, autonomous systems of intelligent agents—each capable of collaboration, context-awareness, and empathy. We’ll share our key learnings from real-world experiments in the last few years and present our vision of scaling trustworthy AI for the next era of customer care.
Fehran Ture Bio
As a Fellow at Comcast, Ferhan Ture focuses on how AI can be applied to improve product experiences as well as internal processes. His technical specialization is around natural language processing (NLP) and AI agents. Ferhan is part of the AI Technologies group, which is collectively responsible for building the backend systems, core algorithms and machine learning models that power various Comcast-affiliated products and services. This includes customer-facing products, like X1 and the Xfinity Assistant. Before joining Comcast in 2015, Ferhan graduated from the PhD program of the Department of Computer Science at University of Maryland in 2013, where he defended his thesis on Machine Translation (MT) and Cross-Language Information Retrieval (CLIR). He is originally from Turkiye but has called Washington DC home for almost 20 years.
Sima Taheri Bio
Sima Taheri is a Senior Research Manager at Comcast’s AI Technologies org. She contributes to the cross-functional effort to re-architect Xfinity Assistant with GenAI and multi-agent techniques, leading several initiatives that improve customer experience end-to-end. She collaborates with colleagues, including Ferhan Ture, on the research and evaluation track, building rubric-based judging, observability, and impact metrics that guide product decisions. Sima earned her Ph.D. in Computer Science from the University of Maryland, College Park, under the supervision of Professor Rama Chellappa. Before the rise of LLMs, she spent years in computer vision across multiple companies, tackling real-world perception and multimedia analytics problems. As LLMs matured, she expanded into NLP and conversational AI, applying that vision mindset to productionizing chatbot systems at scale. She lives in Virginia with her husband and their two kids.
Abstract
The Relative Transfer Function (RTF) is defined as the ratio of Acoustic Transfer Functions (ATFs) relating a source to a pair of microphones after propagation in the enclosure. Numerous studies have shown that beamformers using RTFs as steering vectors significantly outperform counterparts that account only for the Direct Path, which has led to a plethora of methods aimed at improving estimation accuracy. In this talk, we focus on a beamformer that optimizes the Minimum Variance Distortionless Response (MVDR) criterion. Since RTF estimation degrades in noisy, highly reverberant environments, we propose leveraging prior knowledge of the acoustic enclosure to infer a low-dimensional manifold of plausible RTFs. Specifically, we harness a Graph Convolutional Network (GCN) to infer the acoustic manifold, thereby making RTF identification more robust. The model is trained and tested using real acoustic responses from the MIRaGe database recorded at Bar-Ilan University. This database contains multichannel room impulse responses measured from a high-resolution cube-shaped grid to multiple microphone arrays. This high-resolution measurement facilitates inference of the RTF manifold within a defined Region of Interest (ROI). The inferred RTFs are then employed as steering vectors of the MVDR beamformer. Experiments demonstrate improved RTF estimates and, consequently, better beamformer performance leading to enhanced sound quality and improved speech intelligibility under challenging acoustic conditions. Project Page, including audio demonstration and link to code: https://peerrtf.github.io/
Bio
Sharon Gannot is a Full Professor and Vice Dean in the Faculty of Engineering at Bar-Ilan University, where he heads the Data Science Program. He received the B.Sc. (summa cum laude) from the Technion and the M.Sc. (cum laude) and Ph.D. from Tel-Aviv University, followed by a postdoctoral fellowship at KU Leuven. His research focuses on statistical signal processing and machine learning for speech and audio, and he has authored more than 350 peer-reviewed publications on these topics. Among his editorial roles, he is Editor-in-Chief of Speech Communication, serves on the Senior Editorial Board of IEEE Signal Processing Magazine, is an Associate Editor for the IEEE-SPS Education Center, and has served as Senior Area Chair for IEEE/ACM TASLP (2013–2017; 2020–2025). Among his leadership roles, he chaired the IEEE-SPS Audio and Acoustic Signal Processing Technical Committee (2017–2018) and leads the SPS Data Science Initiative (since 2022); he also served as General Co-Chair of IWAENC 2010, WASPAA 2013, and Interspeech 2024. His recognitions include 13 best-paper awards, BIU teaching and research prizes, the 2018 Rector Innovation Award, the 2022 EURASIP Group Technical Achievement Award, and IEEE Fellow.