Fall 2001: CLSP Seminar Series
Fall 2001: CLSP Seminar Series Tuesday, May 13, 2008
CLSP Homepage Search CLSP Current Events @ CLSP

Knowledge Discovery from Text

Dekang Lin - October 23rd, 2001

University of Alberta

Presentation Slides: MS Powerpoint


Text is arguably the richest repository of human knowledge. In this talk I will present two unsupervised algorithms for mining knowledge from text: UNICON (UNsupervised Induction of CONcepts) and DIRT (Discovery of Inference Rules from Text). UNICON is a concept clustering algorithm. Advantages of UNICON over previous approaches include the ability to classify words with low frequency counts, the ability to cluster a large number of elements in a high-dimensional space, and the ability to classify previously unknown words into existing clusters. The DIRT algorithm automatically discovers paraphrasal relationship between natural language expressions such as "X writes Y" and "X is the author of Y" or "X solves Y" and "X finds a solution to Y". DIRT is based on an extended version of Harris' Distributional Hypothesis, which states that words that occurred in the same contexts tend to be similar. Instead of using this hypothesis on words, we apply it to paths in the dependency parse trees. Essentially, if two paths tend to link the same set of words, we hypothesize that their meanings are similar.

Biographical Information

Dekang Lin received his BSc from Tsinghua University in 1985 and his PhD from the University of Alberta in 1992. He is currently Associate Professor at the Department of Computing Science, University of Alberta. He was also Visiting Professor at MIT AI Lab and UMIACS at University of Maryland, College Park. His main research interests in Computational Linguistics include principle-based parsing, learning from parsed corpus, information extraction and question-answering.

Seminar Schedule


The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
*Telephone: (410) 516-4237 *Fax: (410) 516-5050 *E-mail: clsp@clsp.jhu.edu