LLM-powered exploratory text analysis at scale – Mian Zhong (JHU)

When:
February 27, 2026 @ 12:00 pm – 1:15 pm
2026-02-27T12:00:00-05:00
2026-02-27T13:15:00-05:00
Where:
Hodson 216
Cost:
Free

Abstract

How do we better facilitate fine-grained corpus analysis at scale, for example, investigating marketing strategies from millions of documents? We propose HiCode, a two-part LLM pipeline to scale the nuanced analyses that researchers typically conduct manually to large text corpora. Furthermore, to use analysis tools like HiCode, selecting what data to analyze is a challenge, as not all documents are relevant to a particular analysis question and computational constraints preclude analyzing all documents. Little work has examined effects of the data selection strategies that obtain the analysis data for downstream. A systematic evaluation of selection strategies on outputs from four text analyses methods (LDA, BERTopic, TopicGPT, HiCode) is established and reveals practice guidance.

Bio

Mian Zhong is a second-year PhD student at Johns Hopkins University’s center for Language and Speech Processing. She is advised by Prof. Anjalie Field. She works on the intersection of Natural Language Processing and Computational Social Science to innovate both tools and evaluation for text analysis grounded in real-world data (e.g. public health litigations). Her current focus is on text clustering, causal inference with texts, and LLM trustworthiness.

Also Available by Zoomhttps://wse.zoom.us/j/96735183473

Center for Language and Speech Processing