Propaganda is already influencing large language models: evidence from training data, audits, and real-world usage – Brandon Stewart (Princeton)

When:
October 3, 2025 @ 12:00 pm – 1:15 pm
2025-10-03T12:00:00-04:00
2025-10-03T13:15:00-04:00
Where:
Hackerman Hall B17
Cost:
Free

Abstract

Millions of people around the world query (prompt) large language models for information. While several studies have compellingly documented the persuasive potential of these models, there is limited evidence of who or what influences the models themselves, leading to a flurry of concerns about which companies and governments build and regulate the models. We show through six studies that coordinated propaganda from powerful global political institutions already influences the output of U.S.-based large language models via their training data. We first provide evidence that Chinese state propaganda appears in large language model training datasets. To evaluate the plausible effect of this inclusion, we use an open-weight model to show that additional pre-training on Chinese state propaganda generates more positive answers to prompts about Chinese political institutions and leaders. We link this phenomenon to commercial models through two audit studies demonstrating that prompting models in Chinese generates more positive responses about China’s institutions and leaders than the same queries in English. We then use a cross-national audit study to show that languages of countries with lower media freedom exhibit a stronger pro-regime valence than those with higher media freedom. The combination of influence and persuasive potential suggest the troubling conclusion that states and powerful institutions have increased strategic incentives to disseminate propaganda in the hopes of shaping model behavior.

Bio:

Brandon Stewart is an Associate Professor of Sociology at Princeton University with affiliations in the Politics Department, the Office of Population Research, and numerous other units on campus. His work is in text as data, causal inference and the intersection of the two. Before joining Princeton Sociology, he completed his Ph.D. in Government from Harvard in 2015. Along with Justin Grimmer and Molly Roberts, he is the author of the 2022 book Text as Data: A New Framework for Machine Learning and the Social Sciences. He currently serves as Co-Editor-in-Chief of Political Analysis and Associate Editor at Sociological Methods & Research. His work has received several awards including the emerging scholar awards from both the Political Methodology society and the Methodology section of the American Sociological Association.

Center for Language and Speech Processing