Tuomas Viertanen (Tampere University) “Computational Analysis of Sound Scenes”

November 9, 2020 @ 12:00 pm – 1:15 pm
via Zoom


Computational analysis of sound scenes has emerged in the last years as a prominent research area that has several applications, but also different several different engineering tasks to be solved. This presentation give an overview of recent development of the field. We will first review different tasks related to scene analysis from the machine learning point of view. We will start from basic scene classification and event detection and then discuss how these can be extended to more advanced tasks such as joint localization and detection, as well as audio captioning. We will then present how each of these tasks can be solved by state of the art machine learning techniques based on deep neural networks, and what kind of data is needed for training the methods in each of them. We will present examples of machine learning architectures, as well as data that can be used to train the systems. We will present findings from the latest DCASE evaluation campaigns related to the above tasks. We will also shortly discuss open research questions in the field.


Tuomas Virtanen is Professor at Tampere University, Finland, where he is leading the Audio Research Group. He received the M.Sc. and Doctor of Science degrees in information technology from Tampere University of Technology in 2001 and 2006, respectively. He has also been working as a research associate at Cambridge University Engineering Department, UK. He is known for his pioneering work on single-channel sound source separation and computational analysis of environmental sounds. In addition to the above topics, his research interests include content analysis of audio signals and machine learning. He has authored more than 200 scientific publications on the above topics, which have been cited more than 11000 times. He has received the IEEE Signal Processing Society 2012 best paper award for his article “Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria” as well as four other best paper awards. He is an IEEE Senior Member, member of the Audio and Acoustic Signal Processing Technical Committee of IEEE Signal Processing Society.

Center for Language and Speech Processing