Wei Xu (Georgia Tech) “GPT-3 vs Humans: Rethinking Evaluation of Natural Language Generation” @ Hackerman Hall B17
Feb 24 @ 12:00 pm – 1:15 pm


While GPT models have shown impressive performance on summarization and open-ended text generation, it’s important to assess their abilities on more constrained text generation tasks that require significant and diverse rewritings. In this talk, I will discuss the challenges of evaluating systems that are highly competitive and perform close to humans on two such tasks: (i) paraphrase generation and (ii) text simplification. To address these challenges, we introduce an interactive Rank-and-Rate evaluation framework. Our results show that GPT-3.5 has made a major step up from fine-tuned T5 in paraphrase generation, but still lacks the diversity and creativity of humans who spontaneously produce large quantities of paraphrases.

Additionally, we demonstrate that GPT-3.5 performs similarly to a single human in text simplification, which makes it difficult for existing automatic evaluation metrics to distinguish between the two. To overcome this shortcoming, we propose LENS, a learnable evaluation metric that outperforms SARI, BERTScore, and other existing methods in both automatic evaluation and minimal risk decoding for text generation.


Wei Xu is an assistant professor in the School of Interactive Computing at the Georgia Institute of Technology, where she is also affiliated with the new NSF AI CARING Institute and Machine Learning Center. She received her Ph.D. in Computer Science from New York University and her B.S. and M.S. from Tsinghua University. Xu’s research interests are in natural language processing, machine learning, and social media, with a focus on text generation, stylistics, robustness and controllability of machine learning models, and reading and writing assistive technology. She is a recipient of the NSF CAREER Award, CrowdFlower AI for Everyone Award, Criteo Faculty Research Award, and Best Paper Award at COLING’18. She has also received funds from DARPA and IARPA. She is an elected member of the NAACL executive board and regularly serves as a senior area chair for AI/NLP conferences.

Center for Language and Speech Processing