Piotr Zelasko (CLSP at JHU) “Lhotse: a speech data representation library for the modern deep learning ecosystem”

October 29, 2021 @ 12:00 pm – 1:15 pm
Hackerman Hall B17
3400 N. Charles Street
Baltimore MD 21218


Speech data is notoriously difficult to work with due to a variety of codecs, lengths of recordings, and meta-data formats. We present Lhotse, a speech data representation library that draws upon lessons learned from Kaldi speech recognition toolkit and brings its concepts into the modern deep learning ecosystem. Lhotse provides a common JSON description format with corresponding Python classes and data preparation recipes for over 30 popular speech corpora. Various datasets can be easily combined together and re-purposed for different tasks. The library handles multi-channel recordings, long recordings, local and cloud storage, lazy and on-the-fly operations amongst other features. We introduce Cut and CutSet concepts, which simplify common data wrangling tasks for audio and help incorporate acoustic context of speech utterances. Finally, we show how Lhotse leverages PyTorch data API abstractions and adopts them to handle speech data for deep learning.


Piotr Zelasko is an assistant research scientist in the Center for Language and Speech Processing (CLSP) who specializes in automatic speech recognition (ASR) and spoken language understanding (SLU). His current research focuses on applying multilingual and crosslingual speech recognition systems to categorize the phonetic inventory of a previously unknown language and on improving defenses against adversarial attacks on both speaker identification and automatic speech recognition systems. He is also addressing the question of how to structure a spontaneous conversation into high-level semantic units such as dialog acts or topics. Finally, he is working on Lhotse + K2, the next-generation speech processing research software ecosystem. Before joining Johns Hopkins, Zelasko worked as a machine learning consultant for Avaya (2017-2019), and as a machine learning engineer for Techmo (2015-2017). Zelasko received his PhD (2019) in electronics engineering, as well as his master’s (2014) and undergraduate degrees (2013) in acoustic engineering from AGH University of Science and Technology in Kraków, Poland.

Johns Hopkins University

Johns Hopkins University, Whiting School of Engineering

Center for Language and Speech Processing
Hackerman 226
3400 North Charles Street, Baltimore, MD 21218-2680

Center for Language and Speech Processing