Texts Come from People – how Demographic Factors Influence NLP – Dirk Hovy (University of Copenhagen)
The way we express ourselves is heavily influenced by our demographic background and our communicative goals. In NLP, however, we have mostly assumed that a) the goal of language is information, and b) that all demographic groups use language the same. As NLP is applied to more and more domains and text types, these assumptions are challenged.
Sociolinguistics has long investigated the interplay of demographic factors and language use, and it seems likely that the same factors are also present in the data we use to train NLP systems. The resulting bias can harm performance, but can also systematically disadvantage whole demographic groups. As a result, some of the problems we have addressed in domain adaptation might actually require demographic adaptation.
In this talk, I will show how we can combine statistical NLP methods and sociolinguistic theories to the benefit of both fields. I present ongoing research into large-scale statistical analysis of demographic language variation, how this variation affects the performance (and fairness) of NLP systems, and how we can incorporating demographic information to address both problems.
Dirk Hovy is a postdoc at the University of Copenhagen. His interests include lexical semantics, non-standard language, and the interaction of extra-linguistic factors and language use.
Dirk holds an MA in sociolinguistics from the University of Marburg, Germany, and received his PhD in NLP from the University of Southern California, where he worked on unsupervised relation extraction. He has authored multiple papers on WSD, supersenses, NLP for social media, and annotation. He recently shared best paper awards at EACL 2014 and *SEM 2014 for the work with his colleagues in Copenhagen. Outside of research, Dirk enjoys cooking, tango, and leather-crafting, as well as picking up heavy things and putting them back down