NUTMEG: Separating Signal From Noise in Annotator Disagreement – Jonathan Ivey (JHU)

Calendar

When:

September 29, 2025 @ 12:00 pm – 1:15 pm

2025-09-29T12:00:00-04:00

2025-09-29T13:15:00-04:00

Where:

Hackerman Hall B17

Cost:

Free

Seminars Student Seminars

2025 Ivey September

Abstract:

NLP models often rely on human-labeled data for training and evaluation. Many approaches crowdsource this data from a large number of annotators with varying skills, backgrounds, and motivations, resulting in conflicting annotations. These conflicts have traditionally been resolved by aggregation methods that assume disagreements are errors. Recent work has argued that for many tasks annotators may have genuine disagreements and that variation should be treated as signal rather than noise. However, few models separate signal and noise in annotator disagreement. In this work, we introduce NUTMEG, a new Bayesian model that incorporates information about annotator backgrounds to remove noisy annotations from human-labeled training data while preserving systematic disagreements. Using synthetic and real-world data, we show that NUTMEG is more effective at recovering ground-truth from annotations with systematic disagreement than traditional aggregation methods, and we demonstrate that downstream models trained on NUTMEG-aggregated data outperform models trained on data from traditional aggregation methods.

Bio:

Jonathan Ivey is a first year PhD Student in the CLSP co-advised by Anjalie Field and Ziang Xiao.

NUTMEG: Separating Signal From Noise in Annotator Disagreement – Jonathan Ivey (JHU)

Center for Language and Speech Processing