**Abstr
act**

This is a two-part seminar. The first part will be de dicated to discussing how we can apply bayesian approaches to model compar ison in NLP (and also ML in general)\, the drawbacks and limitations of no n-parametric hypothesis testing and experimental reproducibility. A relate d paper to this part is my and Kyle Gorman’s work from this years EMNLP: < a title='Original URL: https://arxiv.org/abs/2010.03088. Click or tap if y ou trust this link.' href='https://nam02.safelinks.protection.outlook.com/ ?url=https%3A%2F%2Farxiv.org%2Fabs%2F2010.03088&data=02%7C01%7Crscally1%40 jhu.edu%7C90f230e420314f59a8e608d86b9b8e2a%7C9fa4f438b1e6473b803f86f8aedf0 dec%7C0%7C0%7C637377663132957114&sdata=ebeePlV3jIrGxuXgmI95BuL2%2B9aXq59dq t37NFldFuI%3D&reserved=0' target='_blank' rel='noopener noreferrer' data-a uth='Verified'>https://arxiv.org/abs/2010.03088

\nThe second par t of the seminar\, a shorter one\, will be dedicated to the work we starte d with Piotr Żelasko in Avaya\, concerning identifying and bridging the pe rformance gap between currently available ASR systems and NLP models for d ownstream language understanding tasks that limits the ability to deliver high quality spoken language understanding among other in the area of spon taneous conversations. This is linked to a series of papers we’ve been wor king on\, first of them just came out and was accepted at EMNLP Findings: https://arxiv.org/abs/2010.03432

\nI’d like to f inish the seminar by starting a discussion about things we could perhaps d o together in the future in the area of measuring ASR+NLP performance.

\n**Biography**

Piotr Szymański is an ass
istant Professor at the Department of Computational Intelligence at the Wr
ocław University of Science and Technology and a Machine Learning Engineer
at Avaya. Professionally involved in data analysis\, statistical reasonin
g\, geospatial data science\, natural language processing\, machine learni
ng and artificial intelligence techniques. He is an alumni of the Top 500
Innovators program at Stanford University\, worked in several institutions
over the years incl. Hasso Plattner Institute in Potsdam\, Josef Stefan I
nstitute in Ljubljana\, University of Notre Dame and University of Technol
ogy Sydney. He is the author of scikit-multilearn\, a popular python libra
ry for multi-label classification. Apart from multi-label classification\,
Piotr published papers concerning urban data\, traffic analysis and bridg
ing the gap between ASR and NLP in spoken language understanding systems.
In his free time he is an urban activist in Wrocław and a member of a city
district council.\n

\n
X-TAGS;LANGUAGE=en-US:2020\,October\,Szymanski
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-20117@www.clsp.jhu.edu
DTSTAMP:20230207T143527Z
CATEGORIES;LANGUAGE=en-US:Seminars
CONTACT:
DESCRIPTION:Abstract\nNeural sequence generation systems oftentimes generat
e sequences by searching for the most likely sequence under the learnt pro
bability distribution. This assumes that the most likely sequence\, i.e. t
he mode\, under such a model must also be the best sequence it has to offe
r (often in a given context\, e.g. conditioned on a source sentence in tra
nslation). Recent findings in neural machine translation (NMT) show that t
he true most likely sequence oftentimes is empty under many state-of-the-a
rt NMT models. This follows a large list of other pathologies and biases o
bserved in NMT and other sequence generation models: a length bias\, large
r beams degrading performance\, exposure bias\, and many more. Many of the
se works blame the probabilistic formulation of NMT or maximum likelihood
estimation. We provide a different view on this: it is mode-seeking search
\, e.g. beam search\, that introduces many of these pathologies and biases
\, and such a decision rule is not suitable for the type of distributions
learnt by NMT systems. We show that NMT models spread probability mass ove
r many translations\, and that the most likely translation oftentimes is a
rare event. We further show that translation distributions do capture imp
ortant aspects of translation well in expectation. Therefore\, we advocate
for decision rules that take into account the entire probability distribu
tion and not just its mode. We provide one example of such a decision rule
\, and show that this is a fruitful research direction.\nBiography\nI am a
n assistant professor (UD) in natural language processing at the Institute
for Logic\, Language and Computation where I lead the Probabilistic Langu
age Learning group.\nMy work concerns the design of models and algorithms
that learn to represent\, understand\, and generate language data. Example
s of specific problems I am interested in include language modelling\, mac
hine translation\, syntactic parsing\, textual entailment\, text classific
ation\, and question answering.\nI also develop techniques to approach gen
eral machine learning problems such as probabilistic inference\, gradient
and density estimation.\nMy interests sit at the intersection of disciplin
es such as statistics\, machine learning\, approximate inference\, global
optimization\, formal languages\, and computational linguistics.\n \n
DTSTART;TZID=America/New_York:20210419T120000
DTEND;TZID=America/New_York:20210419T131500
LOCATION:via Zoom
SEQUENCE:0
SUMMARY:Wilker Aziz (University of Amsterdam) “The Inadequacy of the Mode
in Neural Machine Translation”
URL:https://www.clsp.jhu.edu/events/wilker-aziz-university-of-amsterdam/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n**Abstr
act**

Neural sequence generation systems oftentimes generat e sequences by searching for the most likely sequence under the learnt pro bability distribution. This assumes that the most likely sequence\, i.e. t he mode\, under such a model must also be the best sequence it has to offe r (often in a given context\, e.g. conditioned on a source sentence in tra nslation). Recent findings in neural machine translation (NMT) show that t he true most likely sequence oftentimes is empty under many state-of-the-a rt NMT models. This follows a large list of other pathologies and biases o bserved in NMT and other sequence generation models: a length bias\, large r beams degrading performance\, exposure bias\, and many more. Many of the se works blame the probabilistic formulation of NMT or maximum likelihood estimation. We provide a different view on this: it is mode-seeking search \, e.g. beam search\, that introduces many of these pathologies and biases \, and such a decision rule is not suitable for the type of distributions learnt by NMT systems. We show that NMT models spread probability mass ove r many translations\, and that the most likely translation oftentimes is a rare event. We further show that translation distributions do capture imp ortant aspects of translation well in expectation. Therefore\, we advocate for decision rules that take into account the entire probability distribu tion and not just its mode. We provide one example of such a decision rule \, and show that this is a fruitful research direction.

\n**Bi
ography**

I am an *assistant professor* (UD) in natu
ral language processing at the Institute
for Logic\, Language and Computation where I lead the Probabilistic Language Learning group.

My work concerns the design of models and algorithms that learn to represe nt\, understand\, and generate language data. Examples of specific problem s I am interested in include language modelling\, machine translation\, sy ntactic parsing\, textual entailment\, text classification\, and question answering.

\nI also develop techniques to approach general machine l earning problems such as probabilistic inference\, gradient and density es timation.

\nMy interests sit at the intersection of disciplines such as statistics\, machine learning\, approximate inference\, global optimiz ation\, formal languages\, and computational linguistics.

\n\n< p> \n X-TAGS;LANGUAGE=en-US:2021\,April\,Aziz END:VEVENT END:VCALENDAR