CRAC 18: Shared Task

Overview

Progress in anaphora resolution / coreference, as in other areas of NLP, is achieved not only by creating larger corpora and by developing better computational models for the task, but also by extending the range of issues considered. The creation of the coreference portion of the OntoNotes corpus (Pradhan et al, 2007) greatly boosted research in this area not only by making available a much larger resource, but also by adopting a linguistically more informed view of the task, and by extending the range of anaphoric phenomena considered (Pradhan et al, 2011; Pradhan et al, 2012).

The CRAC 2018 shared task will exploit the availability of new resources, and specifically, the second release of the ARRAU corpus (Uryupina et al, to appear), that enables the study of aspects of anaphora not considered previously, and in particular the phenomena of bridging references (Clark, 1975) and discourse deixis (a type of anaphoric reference to abstract objects) (Webber, 1991), and covering spoken dialogue as well as text. An example of bridging reference is the handles in (1), which refers to a part of the egg vases introduced in u1. (This example is from the GNOME corpus.) An example of discourse deixis is This in (2), from (Webber, 1991).

  1. (u1) These “egg vases” are of exceptional quality.
    (u2) basketwork bases support egg-shaped bodies
    (u3) and bundles of straw form the handles.
  2. For example, binocular stereo fusion is known to take place in a specific area of the cortex near the back of the head.
    Patients with damage to this area of the cortex have visual handicaps but they show no obvious impairment in their ability to think.
    This suggests that stereo fusion is not necessary for thought.

The proposed shared tasks

The proposed shared task will be articulated around a number of sub-tasks, all using the ARRAU corpus, and including at least the following:
  1. Task 1: Resolution of anaphoric identity
    This sub-task will be concerned with anaphoric identity only, but covering all sub-domains of ARRAU: the RST domain (1/3 of the Penn Treebank), the TRAINS-93 domain (task-oriented dialogue), and the Pear Stories domain (spoken narrative).
    Coordinators: Olga Uryupina, Massimo Poesio.
  2. Task 2: Resolution of bridging references in text
    This subtask will be focused on bridging references, using the RST domain of ARRAU.
    Coordinators: Yulia Grishina, Anna Nedoluzhko, Maciej Ogrodniczuk, Massimo Poesio.
  3. Task 3: Resolution of discourse deixis
    This subtask will focus on discourse deixis only, using the RST domain of ARRAU.
    Coordinators: Varada Kolhatkar, Adam Roussel, Fabian Simonjetz, Heike Zinsmeister

More detailed specifications for the three tasks are provided with the Test data and here.

Important dates

  • 12.12 Shared task description circulated and training data available
  • 25.1 Test data distributed
  • 8.2 System runs submitted
  • 22.2 Results published
  • 8.3 System description papers deadline
  • 22.3 Task description paper deadline
  • 29.3 Reviews for all papers
  • 16.4 Camera ready versions for all papers

Data and Resources

The ARRAU corpus is distributed by LDC, that will make it available to the participants to this shared task. Participants should register for the shared task by email to Yulia Grishina (grishina@uni-potsdam.de) and Massimo Poesio (m.poesio@qmul.ac.uk). After registering, participants should download the Agreement form from this site, sign it, and send it to LDC, which will then release the data.

Shared Task Organizers

  • Yulia Grishina, University of Potsdam (chair)
  • Varada Kolhatkar, Simon Fraser University
  • Anna Nedoluzhko, Charles University in Prague
  • Massimo Poesio, Queen Mary University of London
  • Adam Roussel, University of the Ruhr at Bochum
  • Fabian Simonjetz, University of the Ruhr at Bochum
  • Olga Uryupina, University of Trento
  • Heike Zinsmeister, University of Hamburg