BEGIN:VCALENDAR VERSION:2.0 PRODID:-//128.220.36.25//NONSGML kigkonsult.se iCalcreator 2.26.9// CALSCALE:GREGORIAN METHOD:PUBLISH X-FROM-URL:https://www.clsp.jhu.edu X-WR-TIMEZONE:America/New_York BEGIN:VTIMEZONE TZID:America/New_York X-LIC-LOCATION:America/New_York BEGIN:STANDARD DTSTART:20231105T020000 TZOFFSETFROM:-0400 TZOFFSETTO:-0500 RDATE:20241103T020000 TZNAME:EST END:STANDARD BEGIN:DAYLIGHT DTSTART:20240310T020000 TZOFFSETFROM:-0500 TZOFFSETTO:-0400 RDATE:20250309T020000 TZNAME:EDT END:DAYLIGHT END:VTIMEZONE BEGIN:VEVENT UID:ai1ec-23314@www.clsp.jhu.edu DTSTAMP:20240329T000926Z CATEGORIES;LANGUAGE=en-US:Seminars CONTACT: DESCRIPTION:
Abstract
\nWhile GPT models have shown impressive performance on summa rization and open-ended text generation\, it’s important to assess their a bilities on more constrained text generation tasks that require significan t and diverse rewritings. In this talk\, I will discuss the challenges of evaluating systems that are highly competitive and perform close to humans on two such tasks: (i) paraphrase generation and (ii) text simplification . To address these challenges\, we introduce an interactive Rank-and-Rate evaluation framework. Our results show that GPT-3.5 has made a major step up from fine-tuned T5 in paraphrase generation\, but still lacks the diver sity and creativity of humans who spontaneously produce large quantities o f paraphrases.
\nAdditionally\, we demonstrate that GPT-3.5 performs similarly to a sin gle human in text simplification\, which makes it difficult for existing a utomatic evaluation metrics to distinguish between the two. To overcome th is shortcoming\, we propose LENS\, a learnable evaluation metric that outp erforms SARI\, BERTScore\, and other existing methods in both automatic ev aluation and minimal risk decoding for text generation.
\nBiography
\nWei Xu is an assistant professor in the School of Interactive Com puting at the Georgia Institute of Technology\, where she is also affiliat ed with the new NSF AI CARING Institute and Machine Learning Center. She r eceived her Ph.D. in Computer Science from New York University and her B.S . and M.S. from Tsinghua University. Xu’s research interests are in natura l language processing\, machine learning\, and social media\, with a focus on text generation\, stylistics\, robustness and controllability of machi ne learning models\, and reading and writing assistive technology. She is a recipient of the NSF CAREER Award\, CrowdFlower AI for Everyone Award\, Criteo Faculty Research Award\, and Best Paper Award at COLING’18. She has also received funds from DARPA and IARPA. She is an elected member of the NAACL executive board and regularly serves as a senior area chair for AI/ NLP conferences.
DTSTART;TZID=America/New_York:20230224T120000 DTEND;TZID=America/New_York:20230224T131500 LOCATION:Hackerman Hall B17 @ 3400 N. Charles Street\, Baltimore\, MD 21218 SEQUENCE:0 SUMMARY:Wei Xu (Georgia Tech) “GPT-3 vs Humans: Rethinking Evaluation of Na tural Language Generation” URL:https://www.clsp.jhu.edu/events/wei-xu-georgia-tech/ X-COST-TYPE:free X-TAGS;LANGUAGE=en-US:2023\,February\,Xu END:VEVENT END:VCALENDAR