BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-21023@www.clsp.jhu.edu DTSTAMP:20240329T135535Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nSpeech data is notoriously difficult to work with due to a variety of codecs\, lengths of recordings\, and meta-data formats. W e present Lhotse\, a speech data representation library that draws upon le ssons learned from Kaldi speech recognition toolkit and brings its concept s into the modern deep learning ecosystem. Lhotse provides a common JSON d escription format with corresponding Python classes and data preparation r ecipes for over 30 popular speech corpora. Various datasets can be easily combined together and re-purposed for different tasks. The library handles multi-channel recordings\, long recordings\, local and cloud storage\, la zy and on-the-fly operations amongst other features. We introduce Cut and CutSet concepts\, which simplify common data wrangling tasks for audio and help incorporate acoustic context of speech utterances. Finally\, we show how Lhotse leverages PyTorch data API abstractions and adopts them to han dle speech data for deep learning.\nBiography\nPiotr Zelasko is an assista nt research scientist in the Center for Language and Speech Processing (CL SP) who specializes in automatic speech recognition (ASR) and spoken langu age understanding (SLU). His current research focuses on applying multilin gual and crosslingual speech recognition systems to categorize the phoneti c inventory of a previously unknown language and on improving defenses aga inst adversarial attacks on both speaker identification and automatic spee ch recognition systems. He is also addressing the question of how to struc ture a spontaneous conversation into high-level semantic units such as dia log acts or topics. Finally\, he is working on Lhotse + K2\, the next-gene ration speech processing research software ecosystem. Before joining Johns Hopkins\, Zelasko worked as a machine learning consultant for Avaya (2017 -2019)\, and as a machine learning engineer for Techmo (2015-2017). Zelask o received his PhD (2019) in electronics engineering\, as well as his mast er’s (2014) and undergraduate degrees (2013) in acoustic engineering from AGH University of Science and Technology in Kraków\, Poland. DTSTART;TZID=America/New_York:20211029T120000 DTEND;TZID=America/New_York:20211029T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore MD 21218 SEQUENCE:0 SUMMARY:Piotr Zelasko (CLSP at JHU) “Lhotse: a speech data representation l ibrary for the modern deep learning ecosystem” URL:https://www.clsp.jhu.edu/events/piotr-zelasko-clsp-at-jhu-lhotse-a-spee ch-data-representation-library-for-the-modern-deep-learning-ecosystem/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nSpeech data is notoriously difficult t o work with due to a variety of codecs\, lengths of recordings\, and meta- data formats. We present Lhotse\, a speech data representation library tha t draws upon lessons learned from Kaldi speech recognition toolkit and bri ngs its concepts into the modern deep learning ecosystem. Lhotse provides a common JSON description format with corresponding Python classes and dat a preparation recipes for over 30 popular speech corpora. Various datasets can be easily combined together and re-purposed for different tasks. The library handles multi-channel recordings\, long recordings\, local and clo ud storage\, lazy and on-the-fly operations amongst other features. We int roduce Cut and CutSet concepts\, which simplify common data wrangling task s for audio and help incorporate acoustic context of speech utterances. Fi nally\, we show how Lhotse leverages PyTorch data API abstractions and ado pts them to handle speech data for deep learning.
\nB iography
\nPiotr Zelasko is an assistant research scientist in the Center for Language and Speech Processing (CLSP) who specializes i n automatic speech recognition (ASR) and spoken language understanding (SL U). His current research focuses on applying multilingual and crosslingual speech recognition systems to categorize the phonetic inventory of a prev iously unknown language and on improving defenses against adversarial atta cks on both speaker identification and automatic speech recognition system s. He is also addressing the question of how to structure a spontaneous co nversation into high-level semantic units such as dialog acts or topics. F inally\, he is working on Lhotse + K2\, the next-generation speech process ing research software ecosystem. Before joining Johns Hopkins\, Zelasko wo rked as a machine learning consultant for Avaya (2017-2019)\, and as a mac hine learning engineer for Techmo (2015-2017). Zelasko received his PhD (2 019) in electronics engineering\, as well as his master’s (2014) and under graduate degrees (2013) in acoustic engineering from AGH University of Sci ence and Technology in Kraków\, Poland.
\n X-TAGS;LANGUAGE=en-US:2021\,October\,Zelasko END:VEVENT BEGIN:VEVENT UID:ai1ec-21615@www.clsp.jhu.edu DTSTAMP:20240329T135535Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract\n\n\nWe consider a problem of data collection for sema ntically rich NLU tasks\, where detailed semantics of documents (or uttera nces) are captured using a complex meaning representation. Previously\, d ata collection for such tasks was either handled at the cost of extensive annotator training (e.g. in FrameNet or PropBank) or simplified meaning re presentation (e.g. in QA-SRL or Overnight). In this talk\, we present two systems [1\, 2] that aim to support fast\, accurate\, and expressive sema ntic annotations by pairing human workers with a trained model in the loop .\n\nThe first system\, called Guided K-best [1]\, is an annotation toolki t for conversational semantic parsing. Instead of typing annotations from scratch\, data specialists choose a correct parse from the K-best output of a few-shot prototyped model. As the K-best list can be large (e.g. K=1 00)\, we guide the annotators’ exploration of the K-best list via explaina ble hierarchical clustering. In addition\, we experiment with RoBERTa-bas ed reranking of the K-best list to recalibrate the few-shot model towards Accuracy@K. The final system allows to annotate data up to 35% faster tha n the standard\, non-guided K-best and improves the few-shot model’s top-1 accuracy by up to 18%. The second system\, called SchemaBlocks [2]\, is an annotation toolkit for schemas\, or structured descriptions of frequent real-world scenarios (e.g.\, cooking a meal). It represents schemas in t he annotation UI as nested blocks. Using a novel Causal ARM model\, we fu rther speed up the annotation process and guide data specialists towards e xpressive and diverse schemas. As part of this work\, we collect 232 sche mas\, evaluating their internal coherence and their coverage on large-scal e newswire corpora.\n\n\n DTSTART;TZID=America/New_York:20220311T120000 DTEND;TZID=America/New_York:20220311T131500 LOCATION:Virtual Seminar SEQUENCE:0 SUMMARY:Student Seminar – Anton Belyy “Systems for Human-AI Cooperation on Collecting Semantic Annotations” URL:https://www.clsp.jhu.edu/events/student-seminar-anton-belyy-systems-for -human-ai-cooperation-on-collecting-semantic-annotations/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\n\n X-TAGS;LANGUAGE=en-US:2022\,Belyy\,March END:VEVENT END:VCALENDAR