See below for the tokenized tweet data used in “Collective Supervision of Topic Models for Predicting Surveys with Social Media”. AAAI ’16. Please respect the Twitter terms for service, and download no more than one of these files each day (50K tweets). For code to train the topic models, see
https://bitbucket.org/adrianbenton/sprite/
Tweet data:
If you just want the tweet IDs and description of data, see
https://github.com/abenton/collsuptmdata
If you end up using these data, please cite:
Adrian Benton, Michael J. Paul, Braden Hancock, Mark Dredze.
Collective Supervision of Topic Models for Predicting Surveys with Social Media.
Thirtieth AAAI Conference on Artificial Intelligence, 2016.