Magdalena Rybicka (Student Seminar) “End-to-End Neural Speaker Diarization with Non-Autoregressive Attractors”

When:
September 9, 2024 @ 12:00 pm – 1:15 pm
2024-09-09T12:00:00-04:00
2024-09-09T13:15:00-04:00
Where:
Hackerman Hall B17
3400 N. Charles Street
Baltimore
MD 21218
Cost:
Free

Abstract

Despite many recent developments in speaker diarization, it remains a challenge and an active area of research to make diarization robust and effective in real-life scenarios.  End-to-end neural speaker diarization (EEND) systems are considered the next stepping stone in pursuing high-performance diarization. Next, the appearance of EEND with encoder-decoder-based attractors (EEND-EDA) enabled us to deal with recordings that contain a flexible number of speakers thanks to an LSTM-based EDA module. In this talk, I will describe our work on EEND with Non-Autoregressive Attractors (EEND-NAA) approach and recent further improvements to the EEND-NAA architecture, which can handle recordings containing speech of a variable and unknown number of speakers. Our proposed system uses a clustering approach but follows the EEND-EDA framework and end-to-end pipeline, where the autoregressive LSTM-based backend is replaced with non-autoregressive attractor estimation. Our proposal allows to make the process of attractor generation explainable, while the LSTM-based is more obscure.

Center for Language and Speech Processing