Computational Language Analyses for Health and Psychological Discovery – Hansen Andrew Schwartz (University of Pennsylvania)
Baltimore, MD, 21218
What can language analyses reveal about human health and well-being? I build on computational linguistics, typically focused on a better understanding of language, to better understand people — their health and psychological characteristics — as revealed through Facebook status updates, tweets, and other personal discourse. With colleagues from psychology and medicine, we found the language people use, captured in word collocations and latent Dirichlet allocation topics, is highly predictive of personality, gender, age, and depression. Similarly, the language in Tweets from different counties predicts the local life satisfaction, HIV prevalence, and heart disease rates, often more accurately than standard socio-behavioral predictors (e.g. rates of coronary heart disease were predicted above and beyond a combination of demographics, socio-economics, smoking rates, and hypertension rates). Beyond prediction, our language-based analyses yield data-driven insights. For example, language variation by personality is both face valid (e.g. extroverts mention “party”, neurotic people mention “depression”, and conscientious people talk more about the future) and revealing (e.g. introverts are disproportionately interested in Japanese culture, emotionally stable individuals mention topics associate with an active life, and conscientious individuals don’t just talk more about “work” but also about vacations and relaxation).
Andy Schwartz is a Visiting Assistant Professor in Computer & Information Science at the University of Pennsylvania and he will begin as Assistant Professor at Stony Brook University (SUNY) in the Fall of 2015. His interdisciplinary research focused on large and scalable language analyses for health and social sciences. Utilizing natural language processing and machine learning techniques he seeks to discover new behavioral and psychological factors of health and well-being as manifest through language in social media. He received his PhD in Computer Science from the University of Central Florida in 2011 with research on acquiring lexical semantic knowledge from the Web. His recent work has been featured in The Atlantic and The Washington Post.