BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-21023@www.clsp.jhu.edu DTSTAMP:20240329T124517Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nSpeech data is notoriously difficult to work with due to a variety of codecs\, length s of recordings\, and meta-data formats. We present Lhotse\, a speech data representation library that draws upon lessons learned from Kaldi speech recognition toolkit and brings its concepts into the modern deep learning ecosystem. Lhotse provides a common JSON description format with correspon ding Python classes and data preparation recipes for over 30 popular speec h corpora. Various datasets can be easily combined together and re-purpose d for different tasks. The library handles multi-channel recordings\, long recordings\, local and cloud storage\, lazy and on-the-fly operations amo ngst other features. We introduce Cut and CutSet concepts\, which simplify common data wrangling tasks for audio and help incorporate acoustic conte xt of speech utterances. Finally\, we show how Lhotse leverages PyTorch da ta API abstractions and adopts them to handle speech data for deep learnin g.
\nBiography
\nPiotr Zelasko is an a ssistant research scientist in the Center for Language and Speech Processi ng (CLSP) who specializes in automatic speech recognition (ASR) and spoken language understanding (SLU). His current research focuses on applying mu ltilingual and crosslingual speech recognition systems to categorize the p honetic inventory of a previously unknown language and on improving defens es against adversarial attacks on both speaker identification and automati c speech recognition systems. He is also addressing the question of how to structure a spontaneous conversation into high-level semantic units such as dialog acts or topics. Finally\, he is working on Lhotse + K2\, the nex t-generation speech processing research software ecosystem. Before joining Johns Hopkins\, Zelasko worked as a machine learning consultant for Avaya (2017-2019)\, and as a machine learning engineer for Techmo (2015-2017). Zelasko received his PhD (2019) in electronics engineering\, as well as hi s master’s (2014) and undergraduate degrees (2013) in acoustic engineering from AGH University of Science and Technology in Kraków\, Poland.
DTSTART;TZID=America/New_York:20211029T120000 DTEND;TZID=America/New_York:20211029T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore MD 21218 SEQUENCE:0 SUMMARY:Piotr Zelasko (CLSP at JHU) “Lhotse: a speech data representation l ibrary for the modern deep learning ecosystem” URL:https://www.clsp.jhu.edu/events/piotr-zelasko-clsp-at-jhu-lhotse-a-spee ch-data-representation-library-for-the-modern-deep-learning-ecosystem/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2021\,October\,Zelasko END:VEVENT END:VCALENDAR