BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//128.220.36.13//NONSGML kigkonsult.se iCalcreator 2.26.9//
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Center for Language and Speech Processing
X-WR-CALDESC:Johns Hopkins University
X-FROM-URL:https://www.clsp.jhu.edu
X-WR-TIMEZONE:America/New_York
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:STANDARD
DTSTART:20201101T020000
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
RDATE:20211107T020000
TZNAME:EST
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20210314T020000
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
RDATE:20220313T020000
TZNAME:EDT
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:ai1ec-20117@www.clsp.jhu.edu
DTSTAMP:20210419T145213Z
CATEGORIES;LANGUAGE=en-US:Seminars
CONTACT:
DESCRIPTION:Abstract\nNeural sequence generation systems oftentimes generat
e sequences by searching for the most likely sequence under the learnt pro
bability distribution. This assumes that the most likely sequence\, i.e. t
he mode\, under such a model must also be the best sequence it has to offe
r (often in a given context\, e.g. conditioned on a source sentence in tra
nslation). Recent findings in neural machine translation (NMT) show that t
he true most likely sequence oftentimes is empty under many state-of-the-a
rt NMT models. This follows a large list of other pathologies and biases o
bserved in NMT and other sequence generation models: a length bias\, large
r beams degrading performance\, exposure bias\, and many more. Many of the
se works blame the probabilistic formulation of NMT or maximum likelihood
estimation. We provide a different view on this: it is mode-seeking search
\, e.g. beam search\, that introduces many of these pathologies and biases
\, and such a decision rule is not suitable for the type of distributions
learnt by NMT systems. We show that NMT models spread probability mass ove
r many translations\, and that the most likely translation oftentimes is a
rare event. We further show that translation distributions do capture imp
ortant aspects of translation well in expectation. Therefore\, we advocate
for decision rules that take into account the entire probability distribu
tion and not just its mode. We provide one example of such a decision rule
\, and show that this is a fruitful research direction.\nBiography\nI am a
n assistant professor (UD) in natural language processing at the Institute
for Logic\, Language and Computation where I lead the Probabilistic Langu
age Learning group.\nMy work concerns the design of models and algorithms
that learn to represent\, understand\, and generate language data. Example
s of specific problems I am interested in include language modelling\, mac
hine translation\, syntactic parsing\, textual entailment\, text classific
ation\, and question answering.\nI also develop techniques to approach gen
eral machine learning problems such as probabilistic inference\, gradient
and density estimation.\nMy interests sit at the intersection of disciplin
es such as statistics\, machine learning\, approximate inference\, global
optimization\, formal languages\, and computational linguistics.\n \n
DTSTART;TZID=America/New_York:20210419T120000
DTEND;TZID=America/New_York:20210419T131500
LOCATION:via Zoom
SEQUENCE:0
SUMMARY:Wilker Aziz (University of Amsterdam) “The Inadequacy of the Mode
in Neural Machine Translation”
URL:https://www.clsp.jhu.edu/events/wilker-aziz-university-of-amsterdam/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\\n\\n**Abstr
act**

\nNeural sequence generation systems oftentimes generat
e sequences by searching for the most likely sequence under the learnt pro
bability distribution. This assumes that the most likely sequence\, i.e. t
he mode\, under such a model must also be the best sequence it has to offe
r (often in a given context\, e.g. conditioned on a source sentence in tra
nslation). Recent findings in neural machine translation (NMT) show that t
he true most likely sequence oftentimes is empty under many state-of-the-a
rt NMT models. This follows a large list of other pathologies and biases o
bserved in NMT and other sequence generation models: a length bias\, large
r beams degrading performance\, exposure bias\, and many more. Many of the
se works blame the probabilistic formulation of NMT or maximum likelihood
estimation. We provide a different view on this: it is mode-seeking search
\, e.g. beam search\, that introduces many of these pathologies and biases
\, and such a decision rule is not suitable for the type of distributions
learnt by NMT systems. We show that NMT models spread probability mass ove
r many translations\, and that the most likely translation oftentimes is a
rare event. We further show that translation distributions do capture imp
ortant aspects of translation well in expectation. Therefore\, we advocate
for decision rules that take into account the entire probability distribu
tion and not just its mode. We provide one example of such a decision rule
\, and show that this is a fruitful research direction.

\n**Bi
ography**

\nI am an *assistant professor* (UD) in natu
ral language processing at the Institute
for Logic\, Language and Computation where I lead the Probabilistic Language Learning group.

\n
My work concerns the design of models and algorithms that learn to represe
nt\, understand\, and generate language data. Examples of specific problem
s I am interested in include language modelling\, machine translation\, sy
ntactic parsing\, textual entailment\, text classification\, and question
answering.

\nI also develop techniques to approach general machine l
earning problems such as probabilistic inference\, gradient and density es
timation.

\nMy interests sit at the intersection of disciplines such
as statistics\, machine learning\, approximate inference\, global optimiz
ation\, formal languages\, and computational linguistics.

\n

\n<
p> \n
X-TAGS;LANGUAGE=en-US:2021\,April\,Aziz
END:VEVENT
END:VCALENDAR