|
Computers are being increasingly used to manage large
volumes of news and information increasingly available in electronic
form.
The task of the computer is to organize the incoming data into segments
or stories which are related and to index them in a way which makes it
easier for the user to digest them.
A key problem of digesting new data is deciding which
parts contain redundant information so attention can be focused on the
new material. This project proposes to investigate the problem of
analyzing newly arrived news stories for two purposes: (1) to decide if
the story discusses an event or topic that has not been seen earlier
(first story detection); and (2) to identify, within a sequence of stories
on
the same pre-defined topic, which portions of subsequent stories contain
new information and to determine the new named entities that are central
to the topic (within-topic novelty detection). The project will
focus
on extending and combining Information Retrieval and Natural Language
Processing
Extraction techniques toward addressing these questions.
Specifically,
the team will look at identifying who/where/when entities and how to use
them in Information Retrieval and other language modeling approaches for
addressing this problem. An important component of the proposed
project
is investigating the impact on the detection results of using (degraded)
text put out by a speech recognition system. The evaluation of the
project's results will be based on established measures from the Topic
Detection Tracking initiative in the case of first story detection, and
on accuracy of aligning predicted new text with actual new information
(as identified by human experts prior to the workshop) in the case of
novelty
detection.
|