Cross-Document Coreference: Methodologies, Evaluations, and Applications

Amit Bagga

 

Ask Jeeves, Inc.

1551 South Washington Avenue, Suite 400

Piscataway, NJ 08854. USA.

abagga@askjeeves.com

 

Abstract

Cross-Document Coreference occurs when the same person, place, event, or concept is referenced more than once in multiple sources. The resolution of cross-document coreference is useful for a number of higher level tasks such as cross-document summarization and information extraction. Recently, the Entity Detection and Tracking (EDT) task in DARPA’s Automatic Content Extraction (ACE) program has helped focus additional attention on this area. Therefore, there is a growing body of research focusing on this problem.

The talk will attempt to provide a broad overview of the field of cross-document coreference. It will first put the problem into perspective by comparing it with other natural language tasks. Next, an overview of the different methodologies will be provided. This will be followed by a discussion of evaluation issues and algorithms. Finally, a number of different applications of cross-document coreference including cross-media coreference, cross-language coreference, and cross-document information extraction will be discussed.