Examining NLP Benchmark Design Practices – Yu Lu Liu (JHU)

When:
April 20, 2026 @ 12:00 pm – 1:15 pm
2026-04-20T12:00:00-04:00
2026-04-20T13:15:00-04:00
Where:
Hodson 216
Cost:
Free

Abstract

Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which datasets to include, which metrics to use). With benchmarks now growing larger and larger, assessing the quality of benchmarks – e.g., whether they actually measure what they aim to — becomes a difficult task. In my Master’s work titled “ECBD: Evidence-Centered Benchmark Design for NLP”, we drew on evidence-centered design framework from the field of educational testing and proposed Evidence-Centered Benchmark Design (ECBD), a framework which formalizes the benchmark design process, aiming to guide practitioners in building, documenting, and analyzing NLP benchmarks. In this talk, I’ll describe this framework, show how we can use it to analyze NLP benchmark papers, and then talk more informally about our work-in-progress of conducting large-scale analysis of benchmark papers to map out the progress in the field of NLP over the years, common practices and limitations, and what benchmarking will likely be like in the future.

Bio

Yu Lu Liu is a second-year PhD student at Johns Hopkins University, advised by Prof. Ziang Xiao. Her research journey started at McGill University in Montreal, Canada, under the supervision of Prof. Jackie Cheung, where she focused on NLP meta-evaluation: evaluating NLP evaluation methods. Now at JHU, her research interests sit at the intersection of NLP and human-computer interaction, where she is especially curious about how to better evaluate the social impacts of AI, such as the impacts of NLP technologies on work and on interpersonal relationships.

Also Available by Zoomhttps://wse.zoom.us/j/96735183473

 

Center for Language and Speech Processing