SemEval 2010: VP Ellipsis Processing

From SIGSEM

Jump to: navigation, search

Contents

The Phenomenon

Verb Phrase Ellipsis (VPE) occurs in the English language when an auxiliary or modal verb abbreviates an entire verb phrase recoverable from the linguistic context, as in the following examples:

  • Both Dr. Mason and Dr. Sullivan [oppose federal funding for abortion], as does President Bush, except in cases where a woman's life is threatened.
  • They also said that vendors were [delivering goods] more quickly in October than they had for each of the five previous months.
  • He spends his days [sketching passers-by], or trying to.

Here occurrences of VPE are typeset in a bold face font. The antecedent is marked by square brackets.

The Task

The proposed shared task consists of two subtasks: (1) automatically detecting VPE in free text; and (2) selecting the textual antecedent of each found VPE. Task 1 is reasonably difficult (Nielsen 2004 reports an F-score of 71% on Wall Street Journal data).

Task 2 is challenging. With a "head match" evaluation Hardt 1997 reports a success rate of 62% for a baseline system based on recency only, and an accurracy of 84% for an improved system taking recency, clausal relations, parallelism, and quotation into account. We will make the task more realistic (but more difficult) by not using head match but rather precision and recall over each token of the antecedent.

We will provide texts where sentence boundaries are detected and each sentence is tokenised and printed on a new line. An occurrence of VPE is marked by a line number plus token positions of the auxiliary or modal verb. Textual antecedents are assumed to be on one line, and are marked by the line number plus begin/end token position.

The Data

As development data we provide the stand-off annotation of nearly 500 occurrences of manually annotated VPE in the Wall Street Journal part (all 25 sections) of the Penn Treebank. It is here, packed as a tar file: VPE-0.9 (July 2009). After downloading, you can unpack this file by the command: tar xvf vpe-0.9.tar which creates a directory vpe in your working directory.

We have made an arrangement with the Linguistic Data Consortium (LDC) that participants without access to the Penn Treebank can use the raw texts for the duration of the shared task, after they sign and submit a license agreement for the data to the LDC. A copy of the license agreement can be obtained from the organisers.

We will also produce a script that calculates precision and recall of detection and the average F-score and accuracy of antecedent selection based on overlap with a gold standard antecedent.

The test data will be a further collection of newswire (or similar genre) articles. The "gold" standard of the test data will be determined by using the merged results of all task participants. Additionally, these will be manually judged by the organisers.

Organisation

Participation

If you'd like to participate in this shared task, please contact the organisers by email. We will set up a mailing list to keep all participants up to date on all matters of the task.

Organisers

  • Johan Bos (University of Rome "La Sapienza")
  • Jennifer Spenader (University of Groningen)

SemEval 2010

The VP Ellipsis Processing shared task is one of the evaluation exercices organised at SemEval 2010. The time period for SemEval 2010 has not yet been finalised, but it will be held over a two-month period in the first part of 2010. The trial data for the VP Ellipsis Processing task will be released not later than July 2009 (perhaps even earlier).

References

  • Daniel Hardt (1997): An Empirical Approach to VP Ellipsis". Computational Linguistics 23(4).
  • Leif A. Nielsen (2004): "Verb phrase ellipsis detection using automatically parsed text". Proceedings of the 20th international Conference on Computational Linguistics (Geneva, Switzerland).
Personal tools