BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9//
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-FROM-URL:https://www.clsp.jhu.edu
X-WR-TIMEZONE:America/New_York
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:STANDARD
DTSTART:20231105T020000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
RDATE:20241103T020000
TZNAME:EST
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20240310T020000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
RDATE:20250309T020000
TZNAME:EDT
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:ai1ec-21487@www.clsp.jhu.edu
DTSTAMP:20240328T195058Z
CATEGORIES;LANGUAGE=en-US:Seminars
CONTACT:
DESCRIPTION:Abstract\nEnormous amounts of ever-changing knowledge are avai
lable online in diverse textual styles and diverse formats. Recent advance
s in deep learning algorithms and large-scale datasets are spurring progre
ss in many Natural Language Processing (NLP) tasks\, including question an
swering. Nevertheless\, these models cannot scale up when task-annotated t
raining data are scarce. This talk presents my lab’s work toward building
general-purpose models in NLP and how to systematically evaluate them. Fir
st\, I present a general model for two known tasks of question answering i
n English and multiple languages that are robust to small domain shifts.
Then\, I show a meta-training approach that can solve a variety of NLP tas
ks with only using a few examples and introduce a benchmark to evaluate cr
oss-task generalization. Finally\, I discuss neuro-symbolic approaches to
address more complex tasks by eliciting knowledge from structured data and
language models.\n\nBiography\n\nHanna Hajishirzi is an Assistant Profess
or in the Paul G. Allen School of Computer Science & Engineering at the Un
iversity of Washington and a Senior Research Manager at the Allen Institut
e for AI. Her research spans different areas in NLP and AI\, focusing on d
eveloping general-purpose machine learning algorithms that can solve many
NLP tasks. Applications for these algorithms include question answering\,
representation learning\, green AI\, knowledge extraction\, and conversati
onal dialogue. Honors include the NSF CAREER Award\, Sloan Fellowship\, Al
len Distinguished Investigator Award\, Intel rising star award\, best pape
r and honorable mention awards\, and several industry research faculty awa
rds. Hanna received her PhD from University of Illinois and spent a year a
s a postdoc at Disney Research and CMU.
DTSTART;TZID=America/New_York:20220225T120000
DTEND;TZID=America/New_York:20220225T131500
LOCATION:Ames Hall 234 - Presented Virtually Via Zoom https://wse.zoom.us/j
/96735183473
SEQUENCE:0
SUMMARY:Hanna Hajishirzi (University of Washington & Allen Institute for AI
) “Toward Robust\, Knowledge-Rich NLP”
URL:https://www.clsp.jhu.edu/events/hanna-hajishirzi-university-of-washingt
on-allen-institute-for-ai-toward-robust-knowledge-rich-nlp/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n
\\n\\n\\nAbstr
act
\nEno
rmous amounts of ever-changing knowledge are available online in diverse
textual styles and diverse formats. Recent advances in deep learning algor
ithms and large-scale datasets are spurring progress in many Natural Langu
age Processing (NLP) tasks\, including question answering. Nevertheless\,
these models cannot scale up when task-annotated training data are scarce.
This talk presents my lab’s work toward building general-purpose models i
n NLP and how to systematically evaluate them. First\, I present a general
model for two known tasks of question answering in English and multiple l
anguages that are robust to small domain shifts. Then\, I show a meta-tra
ining approach that can solve a variety of NLP tasks with only using a few
examples and introduce a benchmark to evaluate cross-task generalization.
Finally\, I discuss neuro-symbolic approaches to address more comp
lex tasks by eliciting knowledge from structured data and language models.
\n\nBiography
\n\n<
div>Hanna Hajishirzi is an
Assistant Professor in the Paul G. Allen School of Computer Science & Eng
ineering at the University of Washington and a Senior Research Manager at
the Allen Institute for AI. Her research spans different areas in NLP and
AI\, focusing on developing general-purpose machine learning algorithms th
at can solve many NLP tasks. Applications for these algorithms include que
stion answering\, representation learning\, green AI\, knowledge extractio
n\, and conversational dialogue. Honors include the NSF CAREER Award\, Slo
an Fellowship\, Allen Distinguished Investigator Award\, Intel rising star
award\, best paper and honorable mention awards\, and several industry re
search faculty awards. Hanna received her PhD from University of Illinois
and spent a year as a postdoc at Disney Research and CMU.\n
BODY>
X-TAGS;LANGUAGE=en-US:2022\,February\,Hajishirzi
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-22423@www.clsp.jhu.edu
DTSTAMP:20240328T195058Z
CATEGORIES;LANGUAGE=en-US:Seminars
CONTACT:
DESCRIPTION:
DTSTART;TZID=America/New_York:20221007T120000
DTEND;TZID=America/New_York:20221007T131500
LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218
SEQUENCE:0
SUMMARY:Ariya Rastrow (Amazon)
URL:https://www.clsp.jhu.edu/events/ariya-rastrow-amazon-2/
X-COST-TYPE:free
X-TAGS;LANGUAGE=en-US:2022\,October\,Rastrow
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-23304@www.clsp.jhu.edu
DTSTAMP:20240328T195058Z
CATEGORIES;LANGUAGE=en-US:Seminars
CONTACT:
DESCRIPTION:Abstract\nTransformers are essential to pretraining. As we appr
oach 5 years of BERT\, the connection between attention as architecture an
d transfer learning remains key to this central thread in NLP. Other archi
tectures such as CNNs and RNNs have been used to replicate pretraining res
ults\, but these either fail to reach the same accuracy or require supplem
ental attention layers. This work revisits the semanal BERT result and con
siders pretraining without attention. We consider replacing self-attention
layers with recently developed approach for long-range sequence modeling
and transformer architecture variants. Specifically\, inspired by recent p
apers like the structured space space sequence model (S4)\, we use simple
routing layers based on state-space models (SSM) and a bidirectional model
architecture based on multiplicative gating. We discuss the results of th
e proposed Bidirectional Gated SSM (BiGS) and present a range of analysis
into its properties. Results show that architecture does seem to have a no
table impact on downstream performance and a different inductive bias that
is worth exploring further.\nBiography\nAlexander “Sasha” Rush is an Asso
ciate Professor at Cornell Tech. His work is at the intersection of natura
l language processing and generative modeling with applications in text ge
neration\, efficient inference\, and controllability. He has written sever
al popular open-source software projects supporting NLP research and data
science\, and works part-time as a researcher at Hugging Face. He is the s
ecretary of ICLR and developed software used to run virtual conferences du
ring COVID. His work has received paper and demo awards at major NLP\, vis
ualization\, and hardware conferences\, an NSF Career Award\, and a Sloan
Fellowship. He tweets and blogs\, mostly about coding and ML\, at @srush_n
lp.
DTSTART;TZID=America/New_York:20230203T120000
DTEND;TZID=America/New_York:20230203T131500
LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218
SEQUENCE:0
SUMMARY:Sasha Rush (Cornell University) “Pretraining Without Attention”
URL:https://www.clsp.jhu.edu/events/sasha-rush-cornell-university/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\nAbstr
act
\nTransformers are essential to pretraining. As we appr
oach 5 years of BERT\, the connection between attention as architecture an
d transfer learning remains key to this central thread in NLP. Other archi
tectures such as CNNs and RNNs have been used to replicate pretraining res
ults\, but these either fail to reach the same accuracy or require supplem
ental attention layers. This work revisits the semanal BERT result and con
siders pretraining without attention. We consider replacing self-attention
layers with recently developed approach for long-range sequence modeling
and transformer architecture variants. Specifically\, inspired by recent p
apers like the structured space space sequence model (S4)\, we use simple
routing layers based on state-space models (SSM) and a bidirectional model
architecture based on multiplicative gating. We discuss the results of th
e proposed Bidirectional Gated SSM (BiGS) and present a range of analysis
into its properties. Results show that architecture does seem to have a no
table impact on downstream performance and a different inductive bias that
is worth exploring further.
\nBiography
\n
Alexander “Sasha”
Rush is an Associate Professor at Cornell Tech. His work is at the
intersection of natural language processing and generative modeling with
applications in text generation\, efficient inference\, and controllabilit
y. He has written several popular open-source software projects supporting
NLP research and data science\, and works part-time as a researcher at Hu
gging Face. He is the secretary of ICLR and developed software used to run
virtual conferences during COVID. His work has received paper and demo aw
ards at major NLP\, visualization\, and hardware conferences\, an NSF Care
er Award\, and a Sloan Fellowship. He tweets and blogs\, mostly about codi
ng and ML\, at
@srush_nlp.
\n\n
X-TAGS;LANGUAGE=en-US:2023\,February\,Rush
END:VEVENT
END:VCALENDAR