Exploring Self-Play Beyond Competition: Language Model Learning in Negotiation Games – Austen Liao (JHU)

When:
October 13, 2025 @ 12:00 pm – 1:30 pm
2025-10-13T12:00:00-04:00
2025-10-13T13:30:00-04:00
Where:
Hackerman Hall B17
Cost:
Free

Abstract

Game-playing agents like AlphaGo have achieved superhuman performance through self-play, which is theoretically guaranteed to yield optimal policies in competitive games. However, most language tasks are partially or fully cooperative, so it is an open question whether techniques like self-play can effectively be used to improve language models. We empirically investigated this question in a negotiation game setting known as Deal or No Deal (DoND). Crucially, the objective in DoND can be modified to produce a fully cooperative game, a strictly competitive one, or anything in between. We finetuned language models in self-play over multiple rounds of filtered behavior cloning in DoND for each of these objectives and evaluate them in self-play and in collaboration with humans. We found that language models improve substantially in self-play and that the improvements generalize to both cooperation and competition with humans. Beyond the findings of this project, my talk will discuss related potential future research directions of mine and why I find them meaningful.

Β Advisors

Daniel Khashabi, Benjamin Van Durme

Center for Language and Speech Processing