BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-21277@www.clsp.jhu.edu DTSTAMP:20240328T234145Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract\nAs humans\, our understanding of language is grounded in a rich mental model about “how the world works” – that we learn throug h perception and interaction. We use this understanding to reason beyond w hat we literally observe or read\, imagining how situations might unfold i n the world. Machines today struggle at this kind of reasoning\, which lim its how they can communicate with humans.In my talk\, I will discuss three lines of work to bridge this gap between machines and humans. I will firs t discuss how we might measure grounded understanding. I will introduce a suite of approaches for constructing benchmarks\, using machines in the lo op to filter out spurious biases. Next\, I will introduce PIGLeT: a model that learns physical commonsense understanding by interacting with the wor ld through simulation\, using this knowledge to ground language. From an E nglish-language description of an event\, PIGLeT can anticipate how the wo rld state might change – outperforming text-only models that are orders of magnitude larger. Finally\, I will introduce MERLOT\, which learns about situations in the world by watching millions of YouTube videos with transc ribed speech. Through training objectives inspired by the developmental ps ychology idea of multimodal reentry\, MERLOT learns to fuse language\, vis ion\, and sound together into powerful representations.Together\, these di rections suggest a path forward for building machines that learn language rooted in the world.\nBiography\nRowan Zellers is a final year PhD candida te at the University of Washington in Computer Science & Engineering\, adv ised by Yejin Choi and Ali Farhadi. His research focuses on enabling machi nes to understand language\, vision\, sound\, and the world beyond these m odalities. He has been recognized through an NSF Graduate Fellowship and a NeurIPS 2021 outstanding paper award. His work has appeared in several me dia outlets\, including Wired\, the Washington Post\, and the New York Tim es. In the past\, he graduated from Harvey Mudd College with a B.S. in Com puter Science & Mathematics\, and has interned at the Allen Institute for AI. DTSTART;TZID=America/New_York:20220214T120000 DTEND;TZID=America/New_York:20220214T131500 LOCATION:Ames Hall 234 - Presented Virtually Via Zoom https://wse.zoom.us/j /96735183473 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Rowan Zellers (University of Washington) ” Grounding Language by Se eing\, Hearing\, and Interacting” URL:https://www.clsp.jhu.edu/events/rowan-zellers-university-of-washington- grounding-language-by-seeing-hearing-and-interacting/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\nAbstr act
\nAs humans\, our understanding of language is grounded
in a rich mental model about “how the world works” – that we learn throug
h perception and interaction. We use this understanding to reason beyond w
hat we literally observe or read\, imagining how situations might unfold i
n the world. Machines today struggle at this kind of reasoning\, which lim
its how they can communicate with humans.
In my talk\, I will discuss three lines of work to bridge
this gap between machines and humans. I will first discuss how we might m
easure grounded understanding. I will introduce a suite of approaches for
constructing benchmarks\, using machines in the loop to filter out spuriou
s biases. Next\, I will introduce PIGLeT: a model that learns physical com
monsense understanding by interacting with the world through simulation\,
using this knowledge to ground language. From an English-language descript
ion of an event\, PIGLeT can anticipate how the world state might change –
outperforming text-only models that are orders of magnitude larger. Final
ly\, I will introduce MERLOT\, which learns about situations in the world
by watching millions of YouTube videos with transcribed speech. Through tr
aining objectives inspired by the developmental psychology idea of multimo
dal reentry\, MERLOT learns to fuse language\, vision\, and sound together
into powerful representations.
Together\, these directions suggest a path forward for building mac
hines that learn language rooted in the world.
Biography strong>
\nRowan Zellers is a final year PhD candidate at the Univers ity of Washington in Computer Science & Engineering\, advised by Yejin Cho i and Ali Farhadi. His research focuses on enabling machines to understand language\, vision\, sound\, and the world beyond these modalities. He has been recognized through an NSF Graduate Fellowship and a NeurIPS 2021 out standing paper award. His work has appeared in several media outlets\, inc luding Wired\, the Washington Post\, and the New York Times. In the past\, he graduated from Harvey Mudd College with a B.S. in Computer Science & M athematics\, and has interned at the Allen Institute for AI.
\n< /HTML> X-TAGS;LANGUAGE=en-US:2022\,February\,Zellers END:VEVENT BEGIN:VEVENT UID:ai1ec-21494@www.clsp.jhu.edu DTSTAMP:20240328T234145Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION:Abstract\nAdversarial attacks deceive neural network systems by adding carefully crafted perturbations to benign signals. Being almost im perceptible to humans\, these attacks pose a severe security threat to the state-of-the-art speech and speaker recognition systems\, making it vital to propose countermeasures against them. In this talk\, we focus on 1) cl assification of a given adversarial attack into attack algorithm type\, th reat model type\, and signal-to-adversarial-noise ratios\, 2) developing a novel speech denoising solution to further improve the classification per formance. \nOur proposed approach uses an x-vector network as a signature extractor to get embeddings\, which we call signatures. These signatures c ontain information about the attack and can help classify different attack algorithms\, threat models\, and signal-to-adversarial-noise ratios. We d emonstrate the transferability of such signatures to other tasks. In parti cular\, a signature extractor trained to classify attacks against speaker identification can also be used to classify attacks against speaker verifi cation and speech recognition. We also show that signatures can be used to detect unknown attacks i.e. attacks not included during training. Lastly \, we propose to improve the signature extractor by making the job of the signature extractor easier by removing the clean signal from the adversari al example (which consists of clean signal+perturbation). We train our sig nature extractor using adversarial perturbation. At inference time\, we us e a time-domain denoiser to obtain adversarial perturbation from adversari al examples. Using our improved approach\, we show that common attacks in the literature (Fast Gradient Sign Method (FGSM)\, Projected Gradient Desc ent (PGD)\, Carlini-Wagner (CW) ) can be classified with accuracy as high as 96%. We also detect unknown attacks with an equal error rate (EER) of a bout 9%\, which is very promising. DTSTART;TZID=America/New_York:20220304T120000 DTEND;TZID=America/New_York:20220304T131500 LOCATION:Ames Hall 234 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Student Seminar – Sonal Joshi “Classify and Detect Adversarial Atta cks Against Speaker and Speech Recognition Systems” URL:https://www.clsp.jhu.edu/events/student-seminar-sonal-joshi/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nAbstr act
\nAdversarial attacks deceive neural network systems by adding carefully crafted perturbations to benign signals. Being almost imperceptible to humans\, these attacks pose a severe security thr eat to the state-of-the-art speech and speaker recognition systems\, makin g it vital to propose countermeasures against them. In this talk\, we focu s on 1) classification of a given adversarial attack into attack algorithm type\, threat model type\, and signal-to-adversarial-noise ratios\, 2) de veloping a novel speech denoising solution to further improve the classifi cation performance.
\nOur proposed approach uses a n x-vector network as a signature extractor to get embeddings\, which we c all signatures. These signatures contain information about the attack and can help classify different attack algorithms\, threat models\, and signal -to-adversarial-noise ratios. We demonstrate the transferability of such s ignatures to other tasks. In particular\, a signature extractor trained to classify attacks against speaker identification can also be used to class ify attacks against speaker verification and speech recognition. We also s how that signatures can be used to detect unknown attacks i.e. attacks not included during training. Lastly\, we propose to improve the signature e xtractor by making the job of the signature extractor easier by removing t he clean signal from the adversarial example (which consists of clean sign al+perturbation). We train our signature extractor using adversarial pertu rbation. At inference time\, we use a time-domain denoiser to obtain adver sarial perturbation from adversarial examples. Using our improved approach \, we show that common attacks in the literature (Fast Gradient Sign Metho d (FGSM)\, Projected Gradient Descent (PGD)\, Carlini-Wagner (CW) ) can be classified with accuracy as high as 96%. We also detect unknown attacks w ith an equal error rate (EER) of about 9%\, which is very promising.
\n X-TAGS;LANGUAGE=en-US:2022\,Joshi\,March END:VEVENT BEGIN:VEVENT UID:ai1ec-24511@www.clsp.jhu.edu DTSTAMP:20240328T234145Z CATEGORIES;LANGUAGE=en-US:Student Seminars CONTACT: DESCRIPTION: DTSTART;TZID=America/New_York:20240412T120000 DTEND;TZID=America/New_York:20240412T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Sonal Joshi (JHU) URL:https://www.clsp.jhu.edu/events/sonal-joshi-jhu/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2024\,April\,Joshi END:VEVENT END:VCALENDAR