Spring 2003: CLSP Seminar Series
Spring 2003: CLSP Seminar Series Tuesday, May 13, 2008
CLSP Homepage Search CLSP Current Events @ CLSP

MEasuring TExt Reuse

Yorick Wilks - February 18th, 2003

Department of Computer Science, University of Sheffield

Presentation Slides: N/A


In this paper we present initial results from the METER (MEasuring TExt Reuse) project whose aim is to explore issues pertaining to text reuse and derivation, especially in the context of newspapers using newswire sources. Although the reuse of text by journalists has been studied in linguistics,

We are not aware of the investigation using existing computational methods for this particular task and context. In this paper we concentrate on classifying newspapers according to their dependency upon PA copy using a 3-class document-level scheme designed by domain experts from journalism and a number of well-known approaches to text analysis. We show that the 3-class document-level scheme is better implemented as 2 binary Naive Bayes classifiers and gives an F-measure score of 0.7309.

Biographical Information

More biographical information can be found here.

Seminar Schedule


The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
*Telephone: (410) 516-4237 *Fax: (410) 516-5050 *E-mail: clsp@clsp.jhu.edu