BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-24157@www.clsp.jhu.edu DTSTAMP:20240328T185351Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nIn this talk\, I will pres ent a simple extension of image-based Masked Autoencoders (MAE) to self-su pervised representation learning from audio spectrograms. Following the Tr ansformer encoder-decoder design in MAE\, our Audio-MAE first encodes audi o spectrogram patches with a high masking ratio\, feeding only the non-mas ked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens\, in order to reconstruct the input spectrogram. We find it beneficial to incorporate local window atten tion in the decoder\, as audio spectrograms are highly correlated in local time and frequency bands. We then fine-tune the encoder with a lower mask ing ratio on target datasets. Empirically\, Audio-MAE sets new state-of-th e-art performance on six audio and speech classification tasks\, outperfor ming other recent models that use external supervised pre-training.
\n< p>Bio\nFlorian Metze is a Research Scientist Manag er at Meta AI in New York\, supporting a team of researchers and engineers working on multi-modal (image\, video\, audio\, text) content understandi ng for Meta’s Family of Apps (Instagram\, Threads\, Facebook\, WhatsApp). He used to be an Associate Research Professor at Carnegie Mellon Universit y\, in the School of Computer Science’s Language Technologies Institute\, where he still is an Adjunct Professor. He is also a co-founder of Abridge \, a company working on extracting information from doctor patient convers ations. His work covers many areas of speech recognition and multi-media a nalysis with a focus on end-to-end deep learning. Currently\, he focuses o n multi-modal processing of videos\, and using that information to recomme nd unconnected content. In the past\, he has worked on low resource and mu lti-lingual speech processing\, speech recognition with articulatory featu res\, large-scale multi-media retrieval and summarization\, information ex traction from medical interviews\, and recognition of personality or simil ar meta-data from speech.
\nFor more information\, please see http://www.cs.cmu.edu/directory /fmetze
\nDTSTART;TZID=America/New_York:20231110T120000 DTEND;TZID=America/New_York:20231110T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Florian Metze (CMU) “Masked Autoencoders that Listen” URL:https://www.clsp.jhu.edu/events/florian-metze-cmu/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,Metze\,November END:VEVENT BEGIN:VEVENT UID:ai1ec-24481@www.clsp.jhu.edu DTSTAMP:20240328T185351Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nNatural language provides an intuitive and powerful interface to access knowledge at scale. Modern l anguage systems draw information from two rich knowledge sources: (1) info rmation stored in their parameters during massive pretraining and (2) docu ments retrieved at inference time. Yet\, we are far from building systems that can reliably provide information from such knowledge sources. In this talk\, I will discuss paths for more robust systems. In the first part of the talk\, I will present a module for scaling retrieval-based knowledge augmentation. We learn a compressor that maps retrieved documents into tex tual summaries prior to in-context integration. This not only reduces the computational costs but also filters irrelevant or incorrect information. In the second half of the talk\, I will discuss the challenges of updating knowledge stored in model parameters and propose a method to prevent mode ls from reciting outdated information by identifying facts that are prone to rapid change. I will conclude my talk by proposing an interactive syste m that can elicit information from users when needed.
\nBiog raphy
\nEunsol Choi is an assistant pro fessor in the Computer Science department at the University of Texas at Au stin. Prior to UT\, she spent a year at Google AI as a visiting researcher . Her research area spans natural language processing and machine learning . She is particularly interested in interpreting and reasoning about text in a dynamic real world context. She is a recipient of a Facebook research fellowship\, Google faculty research award\, Sony faculty award\, and an outstanding paper award at EMNLP. She received a Ph.D. in computer science and engineering from University of Washington and B.A in mathematics and computer science from Cornell University.
\nDTSTART;TZID=America/New_York:20240315T120000 DTEND;TZID=America/New_York:20240315T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21209 SEQUENCE:0 SUMMARY:Eunsol Choi (University of Texas at Austin) “Knowledge-Rich Languag e Systems in a Dynamic World” URL:https://www.clsp.jhu.edu/events/eunsol-choi-university-of-texas-at-aust in-knowledge-rich-language-systems-in-a-dynamic-world/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2024\,Choi\,March END:VEVENT END:VCALENDAR