Detecting Deceptive On-Line Reviews – Claire Cardie (Cornell University)
Abstract
Consumers increasingly rate, review, and research products online. Consequently, websites containing consumer reviews are becoming targets of opinion spam. While recent work has focused primarily on manually identifiable instances of opinion spam, this talk describes the first study of “deceptive opinion spam” — fictitious opinions that have been deliberately written to sound authentic. Integrating work from psychology and computational linguistics, we develop and compare three approaches to detecting deceptive opinion spam, and ultimately develop a classifier that is nearly 90% accurate on our gold-standard opinion spam dataset. Feature analysis of our learned models reveals a relationship between deceptive opinions and imaginative writing. Finally, the talk will describe the results of a preliminary study that uses the opinion spam classifier to estimate the prevalence of fake reviews on two popular hotel review sites.
Biography
Claire Cardie is a Professor in the Computer Science and Information Science departments at Cornell University. She got her B.S. in Computer Science from Yale University and an M.S. and PhD, also in Computer Science, at the University of Massachusetts at Amherst. Her research in the area of Natural Language Processing has focused on the application and development of machine learning methods for information extraction, coreference resolution, digital government applications, the analysis of opinions and subjective text, and, most recently, deception detection. Cardie is a recipient of a National Science Foundation CAREER award, and has served elected terms as an executive committee member of the Association for Computational Linguistics (ACL), an executive council member of the Association for the Advancement of Artificial Intelligence (AAAI), and twice as secretary of the North American chapter of the ACL (NAACL). Cardie is also co-founder and chief scientist of Appinions.com, a start-up focused on extracting and aggregating opinions from on-line text and social media.