CLSP Guest Lecture Series
|[ WS99 Homepage ]|
Full Presentation: [ .ps | .pdf ]
Karen Sparck Jones
Computer Laboratory, University of Cambridge
People have tried to evaluate systems or system components since research on automatic information and language processing began. But the DARPA conferences have raised the stakes substantially in requiring and delivering systematic evaluations, and in sustaining these through longterm programmes that cover whole series of related tests. It has been claimed that that this rigorous process has significantly advanced information and language processing technology in its own right, for instance in system design, as well as materially advancing task performance as defined by appropriate effectiveness measures. These controlled laboratory evaluations have made very strong assumptions about the task context. The talk will examine these assumptions, consider their impact on evaluation and performance results, and argue that for current tasks of interest, e.g. information extraction, summarising, or answer retrieval, it is now essential to play down the present narrowly-defined performance measures and to address the task context so that new measures, of larger value, can be developed and applied.