I will present our work on data augmentation using style transfer as a way to improve domain adaptation in sequence labeling tasks. The target domain is social media data, and the task is named entity recognition (NER). The premise is that we can transform the labelled out of domain data into something that stylistically is more closely related to the target data. Then we can train a model on a combination of the generated data and the smaller amount of in domain data to improve NER prediction performance. I will show recent empirical results on these efforts.
If time allows, I will also give an overview of other research projects I’m currently leading at RiTUAL (Research in Text Understanding and Analysis of Language) lab. The common thread among all these research problems is the scarcity of labeled data.
Thamar Solorio is a Professor of Computer Science at the University of Houston (UH). She holds graduate degrees in Computer Science from the Instituto Nacional de Astrofísica, Óptica y Electrónica, in Puebla, Mexico. Her research interests include information extraction from social media data, enabling technology for code-switched data, stylistic modeling of text, and more recently multimodal approaches for online content understanding. She is the director and founder of the RiTUAL Lab at UH. She is the recipient of an NSF CAREER award for her work on authorship attribution, and recipient of the 2014 Emerging Leader ABIE Award in Honor of Denice Denton. She is currently serving a second term as an elected board member of the North American Chapter of the Association of Computational Linguistics and was PC co-chair for NAACL 2019. She recently joined the team of Editors in Chief for the ACL Rolling Review (ARR) system. Her research is currently funded by the NSF and by ADOBE.