Leo Du (JHU) “Discrete Gradient-based Sampling with applications to Language Models”
3400 N CHARLES ST
Baltimore
MD 21218
Abstract
Gradient-based sampling algorithms are a cornerstone of modern Bayesian computation, widely used in applications ranging from probabilistic programming to diffusion models. While these methods perform exceptionally well in continuous domains, extending them to discrete domains such as text remains challenging. For example, how can one use them to find a prompt or a chain-of-thought that leads to the desired answer?
In this talk, we present novel gradient-based sampling algorithms to bridge this gap. We begin by analyzing their theoretical properties, such as mixing times, and then demonstrate their effectiveness in discrete domains. A key focus is on sampling from language models subject to differentiable, globally normalized soft constraints (i.e., energy functions). Our results show that these unbiased samplers improve performance on downstream tasks over previous biased methods.