Learning and exploiting statistical dependencies in networks
David Jensen, University of Massachusetts Amherst
January 31, 2007
Networks are ubiquitous in computer science and everyday life. We live embedded in social and professional networks, we communicate through telecommunications and computer networks, and we represent information in documents connected by hyperlinks and bibliographic citations. Only recently, however, have researchers developed techniques to analyze and model data about these networks. These techniques build on work in artificial intelligence, statistics, databases, graph theory, and social network analysis, and they are profoundly expanding the phenomena that we can understand and predict. Emerging applications for these new techniques include citation analysis, web mining, bioinformatics, peer-to-peer networking, computer security, epidemiology, and financial fraud detection. This talk will outline the unifying ideas behind three lines of recent work in my research group: 1) methods for learning joint distributions of variables on networks; 2) methods for navigating networks; and 3) methods for indexing network structure. All these methods share a common thread -- representing and exploiting autocorrelation. Autocorrelation (or homophily) is a common feature of many social networks. Two individuals are more likely to share similar occupations, political beliefs, or cultural backgrounds if they are neighbors. In general, a statistical dependence often exists between the values of the same variable on neighboring entities. Much of the work in my group focuses on relational dependency networks and latent group models, two methods for learning statistical dependencies in social networks. The most important discoveries made using these models are often autocorrelation dependencies. We have also developed expected-value navigation, a method that combines information about autocorrelation and degree structure to efficiently discover short paths in networks. Finally, we have developed network structure indices, a method of annotating networks with artificially created autocorrelated variables to index graph structures so that short paths can be discovered quickly. Network structure indices, in turn, provide several ways to improve our probabilistic modeling, completing a surprising cycle of research unified by the concept of autocorrelation.
David Jensen is Associate Professor of Computer Science and Director of the Knowledge Discovery Laboratory at the University of Massachusetts Amherst. From 1991 to 1995, he served as an analyst with the Office of Technology Assessment, an agency of the United States Congress. He received his doctorate from Washington University in 1992. His research focuses on machine learning and knowledge discovery in relational data, with applications to web mining, social network analysis, and fraud detection. He serves on the program committees of the International Conference on Knowledge Discovery and Data Mining and the International Conference on Machine Learning. He is a member of the 2006-2007 Defense Science Study Group.