Florian Metze (CMU) “Masked Autoencoders that Listen”
Abstract In this talk, I will present a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram[…]