Active Learning with SVMs for Imbalanced Datasets and a Stopping Criterion Based on Stabilizing Predictions – Michael Bloodgood (University of Delaware)

October 30, 2008 all-day

The use of Active Learning (AL) to reduce NLP annotation costs has recently generated considerable interest. There has also been considerable interest in dealing effectively with the class imbalance that NLP problems so often give rise to. Additionally, the use of Support Vector Machines (SVMs) for NLP has become widespread. After explaining relevant background and motivation, I will discuss how to effectively address class imbalance during AL-SVM (AL with SVMs). In particular, I will discuss how to adapt passive learning techniques in order to effectively use asymmetric costs during AL-SVM. In order to realize the performance gains enabled by a strong AL algorithm, an effective stopping criterion is critical. Therefore, I will also present a new stopping criterion based on stabilizing predictions. An evaluation of the proposed techniques will be reported for several Information Extraction and Text Classification tasks.
Michael Bloodgood is a PhD candidate in the Department of Computer and Information Sciences at the University of Delaware. His thesis research deals with Active Learning with Support Vector Machines to reduce NLP annotation costs. More generally, he is interested in reducing training data annotation burdens via active, transfer, semi-supervised, and unsupervised learning techniques. In addition to his thesis work, Michael has worked on anaphora analysis (at U. of Delaware and at Palo Alto Research Center (PARC)), rapidly adapting POS taggers to new domains (at U. of Delaware), and discriminative training for statistical syntax-based machine translation (at USC/ISI). Michael earned his MS in Computer Science from the University of Delaware and a BS in Computer Science and in Information Systems Management from The College of New Jersey.

Center for Language and Speech Processing