Towards Web-Scale Observational Studies of Health – Aron Culotta (Illinois Institute of Technology)
Over the past several years, researchers have found that social media analysis can provide a cheaper, faster complement to traditional public health surveillance systems; examples include tracking rates of influenza, food poisoning, Adderall use, insomnia, depression, PTSD and obesity.
In this talk, I will first briefly summarize our work in this area, including tracking alcohol consumption, influenza and county health. Then, I will outline a framework to go beyond surveillance towards supporting web-scale observational studies, e.g. rather than simply tracking mood, we can identify causes and correlates?
Doing so requires addressing head-on issues of confounding and selection bias that are widespread in social media analysis. This motivates our recent work inferring latent attributes of users (e.g., location, demographics) using lightly supervised machine learning. For example, using aggregate data from census and web traffic statistics, we are able to build a classifier to predict a user’s ethnicity, gender, age, income, education, etc. We show how failing to control for such confounders can have a significant effect on the conclusions of web-scale observational studies.
Aron Culotta is an Assistant Professor of Computer Science at the Illinois Institute of Technology in Chicago, where he leads the Text Analysis in the Public Interest Lab (http://tapilab.github.io/). He obtained his Ph.D. in Computer Science from the University of Massachusetts, Amherst in 2008, advised by Dr. Andrew McCallum, where he developed machine learning algorithms for natural language processing. He was a Microsoft Live Labs Fellow from 2006-2008 and completed research internships at IBM, Google and Microsoft Research. His work has received best paper awards at AAAI and CSCW.