BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-21277@www.clsp.jhu.edu DTSTAMP:20240329T064254Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nAs humans\, our understand
ing of language is grounded in a rich mental model about “how the world wo
rks” – that we learn through perception and interaction. We use this under
standing to reason beyond what we literally observe or read\, imagining ho
w situations might unfold in the world. Machines today struggle at this ki
nd of reasoning\, which limits how they can communicate with humans.
In my talk\, I will discuss th
ree lines of work to bridge this gap between machines and humans. I will f
irst discuss how we might measure grounded understanding. I will introduce
a suite of approaches for constructing benchmarks\, using machines in the
loop to filter out spurious biases. Next\, I will introduce PIGLeT: a mod
el that learns physical commonsense understanding by interacting with the
world through simulation\, using this knowledge to ground language. From a
n English-language description of an event\, PIGLeT can anticipate how the
world state might change – outperforming text-only models that are orders
of magnitude larger. Finally\, I will introduce MERLOT\, which learns abo
ut situations in the world by watching millions of YouTube videos with tra
nscribed speech. Through training objectives inspired by the developmental
psychology idea of multimodal reentry\, MERLOT learns to fuse language\,
vision\, and sound together into powerful representations. Together\, these directions suggest a pa
th forward for building machines that learn language rooted in the world.<
/p>\n
Biography
\nRowan Zellers is a final year P hD candidate at the University of Washington in Computer Science & Enginee ring\, advised by Yejin Choi and Ali Farhadi. His research focuses on enab ling machines to understand language\, vision\, sound\, and the world beyo nd these modalities. He has been recognized through an NSF Graduate Fellow ship and a NeurIPS 2021 outstanding paper award. His work has appeared in several media outlets\, including Wired\, the Washington Post\, and the Ne w York Times. In the past\, he graduated from Harvey Mudd College with a B .S. in Computer Science & Mathematics\, and has interned at the Allen Inst itute for AI.
DTSTART;TZID=America/New_York:20220214T120000 DTEND;TZID=America/New_York:20220214T131500 LOCATION:Ames Hall 234 - Presented Virtually Via Zoom https://wse.zoom.us/j /96735183473 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Rowan Zellers (University of Washington) ” Grounding Language by Se eing\, Hearing\, and Interacting” URL:https://www.clsp.jhu.edu/events/rowan-zellers-university-of-washington- grounding-language-by-seeing-hearing-and-interacting/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2022\,February\,Zellers END:VEVENT BEGIN:VEVENT UID:ai1ec-23320@www.clsp.jhu.edu DTSTAMP:20240329T064254Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:Abstract
\nSpeech communications repr esents a core domain for education\, team problem solving\, social engagem ent\, and business interactions. The ability for Speech Technology to extr act layers of knowledge and assess engagement content represents the next generation of advanced speech solutions. Today\, the emergence of BIG DATA \, Machine Learning\, as well as voice enabled speech systems have require d the need for effective voice capture and automatic speech/speaker recogn ition. The ability to employ speech and language technology to assess huma n-to-human interactions offers new research paradigms having profound impa ct on assessing human interaction. In this talk\, we will focus on big dat a naturalistic audio processing relating to (i) child learning spaces\, an d (ii) the NASA APOLLO lunar missions. ML based technology advancements in clude automatic audio diarization\, speech recognition\, and speaker recog nition. Child-Teacher based assessment of conversational interactions are explored\, including keyword and “WH-word” (e.g.\, who\, what\, etc.). Dia rization processing solutions are applied to both classroom/learning space child speech\, as well as massive APOLLO data. CRSS-UTDallas is expanding our original Apollo-11 corpus\, resulting in a massive multi-track audio processing challenge to make available 150\,000hrs of Apollo mission data to be shared with science communities: (i) speech/language technology\, (i i) STEM/science and team-based researchers\, and (iii) education/historica l/archiving specialists. Our goals here are to provide resources which all ow to better understand how people work/learn collaboratively together. Fo r Apollo\, to accomplish one of mankind’s greatest scientific/technologica l challenges in the last century.
\nBiography
\nJohn H.L. Hansen\, received Ph.D. & M.S. degrees from Georgia Institute of Technology\, and B.S.E.E. from Rutgers Univ. He joined Univ. of Texas at Dallas (UTDallas) in 2005\, where he currently serves as Associate Dean for Research\, Prof. of ECE\, Distinguished Univ. Chair in Telecom. Engin eering\, and directs Center for Robust Speech Systems (CRSS). He is an ISC A Fellow\, IEEE Fellow\, and has served as Member and TC-Chair of IEEE Sig nal Proc. Society\, Speech & Language Proc. Tech. Comm.(SLTC)\, and Techni cal Advisor to U.S. Delegate for NATO (IST/TG-01). He served as ISCA Presi dent (2017-21)\, continues to serve on ISCA Board (2015-23) as Treasurer\, has supervised 99 PhD/MS thesis candidates (EE\,CE\,BME\,TE\,CS\,Ling.\,C og.Sci.\,Spch.Sci.\,Hear.Sci)\, was recipient of 2020 UT-Dallas Provost’s Award for Grad. PhD Research Mentoring\; author/co-author of 865 journal/c onference papers including 14 textbooks in the field of speech/language/he aring processing & technology including coauthor of textbook Discrete-Time Processing of Speech Signals\, (IEEE Press\, 2000)\, and lead author of t he report “The Impact of Speech Under ‘Stress’ on Military Speech Technolo gy\,” (NATO RTO-TR-10\, 2000). He served as Organizer\, Chair/Co-Chair/Tec h.Chair for ISCA INTERSPEECH-2022\, IEEE ICASSP-2010\, IEEE SLT-2014\, ISC A INTERSPEECH-2002\, and Tech. Chair for IEEE ICASSP-2024. He received the 2022 IEEE Signal Processing Society Leo Beranek MERITORIOUS SERVICE Award .
\nDTSTART;TZID=America/New_York:20230303T120000 DTEND;TZID=America/New_York:20230303T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:John Hansen (University of Texas at Dallas) “Challenges and Advance ments in Speaker Diarization & Recognition for Naturalistic Data Streams” URL:https://www.clsp.jhu.edu/events/john-hansen-university-of-texas-at-dall as/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,Hansen\,March END:VEVENT END:VCALENDAR