To enable naturalistic, context-aware language generation, the underlying models must be flexible but controllable. They must be flexible enough to account for the rich linguistic diversity of data that the model generates and conditions on. On the other hand, generation must be controlled, to lexicalize the same meaning differently, depending upon the social and the situational context. I’ll present model-based approaches to multilingual language modeling and open-vocabulary machine translation, aiming at making language generation more flexible by relaxing the (unreasonable but prevalent in the literature) assumption that a model’s vocabulary is constrained to a particular set of most frequent words in a particular language. Then, I’ll present an approach to controllable text generation that modulates social variables in the generated text.
Yulia Tsvetkov is an assistant professor in the Language Technologies Institute at Carnegie Mellon University. Her research interests lie at or near the intersection of natural language processing, machine learning, linguistics, and social science. Her current research projects focus on multilinguality (e.g., open-vocabulary machine translation, polyglot models, entrainment in code-switching), controllable text generation, automated negotiation, and NLP for social good (e.g., identification of microaggressions and dehumanization in online interactions, identification of misinformation and agenda-setting in news, predicting scientific misconduct). Prior to joining LTI, Yulia was a postdoc in the Department of Computer Science at Stanford University; she received her Ph.D. from Carnegie Mellon University.