BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-21023@www.clsp.jhu.edu DTSTAMP:20240328T155206Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nSpeech data is notoriously difficult to work with due to a variety of codecs\, length s of recordings\, and meta-data formats. We present Lhotse\, a speech data representation library that draws upon lessons learned from Kaldi speech recognition toolkit and brings its concepts into the modern deep learning ecosystem. Lhotse provides a common JSON description format with correspon ding Python classes and data preparation recipes for over 30 popular speec h corpora. Various datasets can be easily combined together and re-purpose d for different tasks. The library handles multi-channel recordings\, long recordings\, local and cloud storage\, lazy and on-the-fly operations amo ngst other features. We introduce Cut and CutSet concepts\, which simplify common data wrangling tasks for audio and help incorporate acoustic conte xt of speech utterances. Finally\, we show how Lhotse leverages PyTorch da ta API abstractions and adopts them to handle speech data for deep learnin g.
\nBiography
\nPiotr Zelasko is an a ssistant research scientist in the Center for Language and Speech Processi ng (CLSP) who specializes in automatic speech recognition (ASR) and spoken language understanding (SLU). His current research focuses on applying mu ltilingual and crosslingual speech recognition systems to categorize the p honetic inventory of a previously unknown language and on improving defens es against adversarial attacks on both speaker identification and automati c speech recognition systems. He is also addressing the question of how to structure a spontaneous conversation into high-level semantic units such as dialog acts or topics. Finally\, he is working on Lhotse + K2\, the nex t-generation speech processing research software ecosystem. Before joining Johns Hopkins\, Zelasko worked as a machine learning consultant for Avaya (2017-2019)\, and as a machine learning engineer for Techmo (2015-2017). Zelasko received his PhD (2019) in electronics engineering\, as well as hi s master’s (2014) and undergraduate degrees (2013) in acoustic engineering from AGH University of Science and Technology in Kraków\, Poland.
DTSTART;TZID=America/New_York:20211029T120000 DTEND;TZID=America/New_York:20211029T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore MD 21218 SEQUENCE:0 SUMMARY:Piotr Zelasko (CLSP at JHU) “Lhotse: a speech data representation l ibrary for the modern deep learning ecosystem” URL:https://www.clsp.jhu.edu/events/piotr-zelasko-clsp-at-jhu-lhotse-a-spee ch-data-representation-library-for-the-modern-deep-learning-ecosystem/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2021\,October\,Zelasko END:VEVENT BEGIN:VEVENT UID:ai1ec-22408@www.clsp.jhu.edu DTSTAMP:20240328T155206Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nAbstract
\nRecent advances in large pretrained language models have unlocked new exciting a pplications for Natural Language Generation for creative tasks\, such as l yrics or humour generation. In this talk we will discuss recent works by o ur team at Alexa AI and discuss current challenges: (1) Pun understanding and generation: We release new datasets for pun understanding and the nove l task of context-situated pun generation\, and demonstrate the value of o ur annotations for pun classification and generation tasks. (2) Song lyric generation: we design a hierarchical lyric generation framework that enab les us to generate pleasantly-singable lyrics without training on melody-l yric aligned data\, and show that our approach is competitive with strong baselines supervised on parallel data. (3) Create with Alexa: a multimodal story creation experience recently launched on Alexa devices\, which leve rages story text generation models in tandem with story visualization and background music generation models to produce multimodal stories for kids.
\nBiography
\nAlessandra Cervone is an Appli ed Scientist in the Natural Understanding team at Amazon Alexa AI. Alessan dra holds an MSc in Speech and Language Processing from University of Edin burgh and a PhD in CS from University of Trento (Italy). During her PhD\, Alessandra worked on computational models of coherence in open-domain dial ogue advised by Giuseppe Riccardi. In the first year of the PhD\, she was the team leader of one of the teams selected to compete in the first editi on of the Alexa Prize. More recently\, her research interests have been fo cused on natural language generation and its evaluation\, in particular in the context of creative AI applications.
\nDTSTART;TZID=America/New_York:20230317T120000 DTEND;TZID=America/New_York:20230317T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Alessandra Cervone (Amazon) “Controllable Text Generation for Creat ive Applications URL:https://www.clsp.jhu.edu/events/alexxandra-cervone-amazon/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,Cervone\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-24157@www.clsp.jhu.edu DTSTAMP:20240328T155206Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nIn this talk\, I will pres ent a simple extension of image-based Masked Autoencoders (MAE) to self-su pervised representation learning from audio spectrograms. Following the Tr ansformer encoder-decoder design in MAE\, our Audio-MAE first encodes audi o spectrogram patches with a high masking ratio\, feeding only the non-mas ked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens\, in order to reconstruct the input spectrogram. We find it beneficial to incorporate local window atten tion in the decoder\, as audio spectrograms are highly correlated in local time and frequency bands. We then fine-tune the encoder with a lower mask ing ratio on target datasets. Empirically\, Audio-MAE sets new state-of-th e-art performance on six audio and speech classification tasks\, outperfor ming other recent models that use external supervised pre-training.
\n< p>Bio\nFlorian Metze is a Research Scientist Manag er at Meta AI in New York\, supporting a team of researchers and engineers working on multi-modal (image\, video\, audio\, text) content understandi ng for Meta’s Family of Apps (Instagram\, Threads\, Facebook\, WhatsApp). He used to be an Associate Research Professor at Carnegie Mellon Universit y\, in the School of Computer Science’s Language Technologies Institute\, where he still is an Adjunct Professor. He is also a co-founder of Abridge \, a company working on extracting information from doctor patient convers ations. His work covers many areas of speech recognition and multi-media a nalysis with a focus on end-to-end deep learning. Currently\, he focuses o n multi-modal processing of videos\, and using that information to recomme nd unconnected content. In the past\, he has worked on low resource and mu lti-lingual speech processing\, speech recognition with articulatory featu res\, large-scale multi-media retrieval and summarization\, information ex traction from medical interviews\, and recognition of personality or simil ar meta-data from speech.
\nFor more information\, please see http://www.cs.cmu.edu/directory /fmetze
\nDTSTART;TZID=America/New_York:20231110T120000 DTEND;TZID=America/New_York:20231110T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Florian Metze (CMU) “Masked Autoencoders that Listen” URL:https://www.clsp.jhu.edu/events/florian-metze-cmu/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,Metze\,November END:VEVENT END:VCALENDAR