Alicia Lozano-Diez (Universidad Autonoma de Madrid) “Text-Dependent Speaker Verification: BUT system for the SdSV Challenge 2020”

March 12, 2021 @ 12:00 pm – 1:15 pm
via Zoom


Speaker verification (SV) is particularly challenging when the audio recordings are short in duration. Aiming at this, the Short-duration Speaker Verification (SdSV) challenge was organized in 2020, providing a framework for the evaluation of automatic systems in this context.

In this talk, I will focus on the text-dependent SV task, where the system has to verify both the speaker identity and the phrase contained in the audio recording. In particular, I will describe the Brno University of Technology (BUT) system submitted for the text-dependent task of the challenge, which provided the best performance among the participants’ submissions. We explored successful techniques from text-independent SV systems in the text-dependent scenario: we combined x-vector based systems with i-vectors trained on concatenated MFCCs and bottleneck features, which have proven effective for the text-dependent task. We also proposed the use of a phrase-dependent PLDA backend for scoring and its combination with a simple phrase recognizer.


Dr. Alicia Lozano-Diez received the double degree in Computer Science Engineering and Mathematics in 2012, the Master in Research and Innovation in ICT in 2013 and her Ph.D in 2018, all at Universidad Autónoma de Madrid (UAM), Spain. During her Ph.D., she interned at Speech group (Speech@FIT) at Brno University of Technology (BUT, Brno, Czech Republic) and at SRI International (STAR Lab, California, USA). In 2019, she joined Speech@FIT (BUT) with an individual H2020 Marie Curie fellowship for the project ETE SPEAKER, working on speaker recognition and diarization. Currently, she is back at the Audias research group at UAM as an assistant professor. Her research interests are mainly language and speaker recognition and diarization with deep neural networks. She has participated in several international conferences and technology evaluations in the field.

Center for Language and Speech Processing