STEP 2008 shared task: comparing semantic representations

From SIGSEM

Jump to: navigation, search

Contents

Background

The STEP 2008 workshop features a "shared task" to compare semantic representations as output by state-of-the-art NLP systems. Participating systems are given a set of small texts, before the STEP workshop. The output of these systems will be judged on a number of aspects by a panel of experts in the field, during the workshop.

The aim of the shared task is to get (i) an idea of the state-of-the-art of open-domain semantic analysis, (ii) discuss the feasibility of a gold standard for deep semantic representations and (iii) to identify a set of problematic and relevant issues for semantic evaluation.

Seven teams will participate in the shared task. They were asked to submit a paper containing a description of the system and semantic formalism, as well as an authentic small text not exceeding five sentences and 120 tokens. The "test data" for the shared task is composed of all the texts submitted by the participants, allowing participants to challenge each other.

Sharted Task System Descriptions

System 1: BLUE

Boeing's NLP system, BLUE, comprises a pipeline of a parser, a logical form (LF) generator, an initial logic generator, and further processing modules. The initial logic generator produces logic whose structure closely mirrors the structure of the original text. The subsequent processing modules then perform, with somewhat limited scope, additional transformations to convert this into a more usable representation with respect to a specific target ontology, better able to support inference.

System 2: Boxer

Boxer is an open-domain software component for semantic analysis of text, based on Combinatory Categorial Grammar (CCG) and Discourse Representation Theory (DRT). Used together with the C&C tools, Boxer reaches more than 95% coverage on newswire texts. The semantic representations produced by Boxer, known as Discourse Representation Structures (DRSs), incorporate a neo-Davidsonian representations for events, using the VerbNet inventory of thematic roles. The resulting DRSs can be translated to ordinary first-order logic formulas and be processing by standard theorem provers for first-order logic.

System 3: GETARUNS

GETARUNS is a system for text understanding developed at the University of Venice, and is equipped with three main modules: a lower module for parsing where sentence strategies are implemented; a middle module for semantic interpretation and discourse model construction which is cast into Situation Semantics; and a higher module where reasoning and generation takes place.

System 4: LXGram

LXGram is a hand-built Portuguese computational grammar based on HPSG (syntax) and MRS (semantics). Because the LXGram generates many different analyses (mainly due to PP attachment ambiguities), the preferred analysis was selected manually. It was required to extend LXGram's lexicon and inventory of syntax rules to be able to get a reasonable performance on the shared task data.

System 5: OntoSem

OntoSem, which is the implementation of the theory of Ontological Semantics, is a text-processing environment that takes as input unrestricted raw text and carries out preprocessing followed by morphological, syntactic, semantic, and discourse analysis, with the results of analysis represented as a formal text-meaning representation (TMR) that can then be used as the basis for various applications.

System 6: TextCap

TextCap is a semantic parser that takes unrestricted English text, using publically available computational linguistics tools and lexical resources. It outputs produces semantic triples which can be used in a variety of tasks such as generating knowledge bases, providing raw material for question answering systems, or creating RDF structures.

System 7: Trips

This system for semantic text processing has the TRIPS parser at the core, augmented with statistical preprocessing techniques and online lexical lookup. A graphical logical form is used as a semantic representation for text understanding. This representation was designed to bridge the gap between highly expressive "deep" representations of logical forms and more shallow semantic encodings such as word senses and semantic relations. It preserves rich semantic content while allowing for compact ambiguity encoding and viable partial representations. Here is more detailed information.

Shared Task Texts

Text 1

An object is thrown with a horizontal speed of 20 m/s from a cliff that is 125 m high. The object falls for the height of the cliff. If air resistance is negligible, how long does it take the object to fall to the ground? What is the duration of the fall?

Text 2

Cervical cancer is caused by a virus. That has been known for some time and it has led to a vaccine that seems to prevent it. Researchers have been looking for other cancers that may be caused by viruses.

Text 3

John went into a restaurant. There was a table in the corner. The waiter took the order. The atmosphere was warm and friendly. He began to read his book.

Text 4

The first school for the training of leader dogs in the country is going to be created in Mortagua and will train 22 leader dogs per year. In Mortagua, Joao Pedro Fonseca and Marta Gomes coordinate the project that seven people develop in this school. They visited several similar places in England and in France, and two future trainers are already doing internship in one of the French Schools. The communitarian funding ensures the operation of the school until 1999. We would like our school to work similarly to the French ones, which live from donations, from the merchandising and even from the raffles that children sell in school.

Text 5

As the 3 guns of Turret 2 were being loaded, a crewman who was operating the center gun yelled into the phone, ``I have a problem here. I am not ready yet.'' Then the propellant exploded. When the gun crew was killed they were crouching unnaturally, which suggested that they knew that an explosion would happen. The propellant that was used was made from nitrocellulose chunks that were produced during World War II and were repackaged in 1987 in bags that were made in 1945. Initially it was suspected that this storage might have reduced the powder's stability.

Text 6

Amid the tightly packed row houses of North Philadelphia, a pioneering urban farm is providing fresh local food for a community that often lacks it, and making money in the process. Greensgrow, a one-acre plot of raised beds and greenhouses on the site of a former steel-galvanizing factory, is turning a profit by selling its own vegetables and herbs as well as a range of produce from local growers, and by running a nursery selling plants and seedlings. The farm earned about $10,000 on revenue of $450,000 in 2007, and hopes to make a profit of 5 percent on $650,000 in revenue in this, its 10th year, so it can open another operation elsewhere in Philadelphia.

Text 7

Modern development of wind-energy technology and applications was well underway by the 1930s, when an estimated 600,000 windmills supplied rural areas with electricity and water-pumping services. Once broad-scale electricity distribution spread to farms and country towns, use of wind energy in the United States started to subside, but it picked up again after the U.S. oil shortage in the early 1970s. Over the past 30 years, research and development has fluctuated with federal government interest and tax incentives. In the mid-'80s, wind turbines had a typical maximum power rating of 150 kW. In 2006, commercial, utility-scale turbines are commonly rated at over 1 MW and are available in up to 4 MW capacity.

System Output

System Name Authors Affiliation Text 1 Text 2 Text 3 Text 4 Text 5 Text 6 Text 7
1 BLUE Clark, Harrison, Murray, Thompson Boeing sys1text1 sys1text2 sys1text3 sys1text4 sys1text5 sys1text6 sys1text7
2 Boxer Bos University of Rome "La Sapienza" sys2text1 sys2text2 sys2text3 sys2text4 sys2text5 sys2text6 sys2text7
3 GETARUNS Delmonte, Pianta University of Venice "Ca' Foscari" sys3text1 sys3text2 sys3text3 sys3text4 sys3text5 sys3text6 sys3text7
4 LXGram Branco, Costa University of Lisbon sys4text1 sys4text2 sys4text3 sys4text4 sys4text5 sys4text7
5 OntoSem McShane, Nirenburg, Beale UMBC sys5text1 sys5text2 sys5text3 sys5text4 sys5text5 sys5text6 sys5text7
6 TextCap Callaway University of Edinburgh sys6text1 sys6text2 sys6text3 sys6text4 sys6text5 sys6text6 sys6text7
7 Trips Allen, Swift, de Beaumont University of Rochester sys7text1 sys7text2 sys7text3 sys7text4 sys7text5 sys7text6
Personal tools