Seminars

Nov
7
Fri
“Scaling Artificial Intelligence for Multi-Tumor Early Detection with More Reports, Fewer Masks” – Zongwei Zhou (JHU) @ Hackerman Hall B17
Nov 7 @ 12:00 pm – 1:15 pm

Abstract

Early tumor detection saves lives. Each year, more than 300 million computed tomography (CT) scans are performed worldwide, offering a vast opportunity for effective cancer screening. However, detecting small or early-stage tumors on these CT scans remains challenging, even for experts. Artificial intelligence (AI) models can assist by highlighting suspicious regions, but training such models typically requires extensive tumor masks–detailed, voxel-wise outlines of tumors manually drawn by radiologists. Drawing these masks is costly, requiring years of effort and millions of dollars. In contrast, nearly every CT scan in clinical practice is already accompanied by medical reports describing the tumor’s size, number, appearance, and sometimes, pathology results–information that is rich, abundant, and often underutilized for AI training. This talk will introduce ways of training AI to segment tumors that match their descriptions in medical reports. This approach scales AI training with large collections of readily available medical reports, substantially reducing the need for manually drawn tumor masks.

Bio

Zongwei Zhou is an Assistant Research Professor in the Department of Computer Science at Johns Hopkins University and a member of the Malone Center for Engineering in Healthcare. His research focuses on medical computer vision, language, and graphics for cancer detection and diagnosis. He is best known for developing UNet++, a widely adopted segmentation architecture cited nearly 16,000 times since its publication in 2019. He currently serves as PI on an NIH–NIBIB R01 grant ($2.8M, top 1.0 percentile). His work has earned multiple honors, including the AMIA Doctoral Dissertation Award, Elsevier–MedIA Best Paper Award, and MICCAI Young Scientist Award. Dr. Zhou also received the President’s Award for Innovation, the highest honor for graduate students at Arizona State University and has been recognized among the Top 2% of Scientists Worldwide every year since 2022.

Slides

Nov
10
Mon
SPS Webinar: Minor Manipulations, Major Threat: An Overview of Partially Fake Speech – Lin Zhang (JHU) @ Hackerman Hall B17
Nov 10 @ 12:00 pm – 1:15 pm

Abstract

Speech can easily be manipulated through techniques, such as text-to-speech synthesis, voice conversion, replay, tampering, adversarial attacks, and more. However, when the manipulation is applied only to a minor portion of an audio, the remaining real segments can have a dominant influence upon human listeners and make machine detection extremely challenging. Therefore, there is an urgent need to explore such a scenario, where synthetic speech is embedded within otherwise real audio. The primary objective of this webinar is to review research efforts aimed at defending against such partially fake audio with focus on relevant databases, explainable analyses, and three core tasks (spoof detection, localization, and diarization).

Bio

Lin Zhang received the M.S. degree from Tianjin University, Tianjin, China, in 2020, and the Ph.D. degree from the Graduate University for Advanced Studies / National Institute of Informatics, Tokyo, Japan in 2024. She is currently a Postdoctoral Fellow at the Center for Language and Speech Processing, Johns Hopkins University, USA. She has also visited and/or worked at Brno University of Technology and Duke Kunshan University. Her research interests include speech security and privacy, speech production, as well as machine learning.

Nov
24
Mon
peerRTF: Robust MVDR Beamforming Using Graph Convolutional Network – Sharon Gannot (Bar-llan University, Israel) @ Hackerman Hall B17
Nov 24 @ 12:00 pm – 1:15 pm

Abstract

The Relative Transfer Function (RTF) is defined as the ratio of Acoustic Transfer Functions (ATFs) relating a source to a pair of microphones after propagation in the enclosure. Numerous studies have shown that beamformers using RTFs as steering vectors significantly outperform counterparts that account only for the Direct Path, which has led to a plethora of methods aimed at improving estimation accuracy. In this talk, we focus on a beamformer that optimizes the Minimum Variance Distortionless Response (MVDR) criterion. Since RTF estimation degrades in noisy, highly reverberant environments, we propose leveraging prior knowledge of the acoustic enclosure to infer a low-dimensional manifold of plausible RTFs. Specifically, we harness a Graph Convolutional Network (GCN) to infer the acoustic manifold, thereby making RTF identification more robust. The model is trained and tested using real acoustic responses from the MIRaGe database recorded at Bar-Ilan University. This database contains multichannel room impulse responses measured from a high-resolution cube-shaped grid to multiple microphone arrays. This high-resolution measurement facilitates inference of the RTF manifold within a defined Region of Interest (ROI). The inferred RTFs are then employed as steering vectors of the MVDR beamformer. Experiments demonstrate improved RTF estimates and, consequently, better beamformer performance leading to enhanced sound quality and improved speech intelligibility under challenging acoustic conditions. Project Page, including audio demonstration and link to code: https://peerrtf.github.io/

Bio

Sharon Gannot is a Full Professor and Vice Dean in the Faculty of Engineering at Bar-Ilan University, where he heads the Data Science Program. He received the B.Sc. (summa cum laude) from the Technion and the M.Sc. (cum laude) and Ph.D. from Tel-Aviv University, followed by a postdoctoral fellowship at KU Leuven. His research focuses on statistical signal processing and machine learning for speech and audio, and he has authored more than 350 peer-reviewed publications on these topics. Among his editorial roles, he is Editor-in-Chief of Speech Communication, serves on the Senior Editorial Board of IEEE Signal Processing Magazine, is an Associate Editor for the IEEE-SPS Education Center, and has served as Senior Area Chair for IEEE/ACM TASLP (2013–2017; 2020–2025). Among his leadership roles, he chaired the IEEE-SPS Audio and Acoustic Signal Processing Technical Committee (2017–2018) and leads the SPS Data Science Initiative (since 2022); he also served as General Co-Chair of IWAENC 2010, WASPAA 2013, and Interspeech 2024. His recognitions include 13 best-paper awards, BIU teaching and research prizes, the 2018 Rector Innovation Award, the 2022 EURASIP Group Technical Achievement Award, and IEEE Fellow.

Center for Language and Speech Processing