STEP 2008 shared task: comparing semantic representations
From SIGSEM
Contents |
Background
The STEP 2008 workshop features a "shared task" to compare semantic representations as output by state-of-the-art NLP systems. Participating systems are given a set of small texts, before the STEP workshop. The output of these systems will be judged on a number of aspects by a panel of experts in the field, during the workshop.
The aim of the shared task is to get (i) an idea of the state-of-the-art of open-domain semantic analysis, (ii) discuss the feasibility of a gold standard for deep semantic representations and (iii) to identify a set of problematic and relevant issues for semantic evaluation.
Seven teams will participate in the shared task. They were asked to submit a paper containing a description of the system and semantic formalism, as well as an authentic small text not exceeding five sentences and 120 tokens. The "test data" for the shared task is composed of all the texts submitted by the participants, allowing participants to challenge each other.
Sharted Task System Descriptions
System 1: BLUE
Boeing's NLP system, BLUE, comprises a pipeline of a parser, a logical form (LF) generator, an initial logic generator, and further processing modules. The initial logic generator produces logic whose structure closely mirrors the structure of the original text. The subsequent processing modules then perform, with somewhat limited scope, additional transformations to convert this into a more usable representation with respect to a specific target ontology, better able to support inference.
System 2: Boxer
Boxer is an open-domain software component for semantic analysis of text, based on Combinatory Categorial Grammar (CCG) and Discourse Representation Theory (DRT). Used together with the C&C tools, Boxer reaches more than 95% coverage on newswire texts. The semantic representations produced by Boxer, known as Discourse Representation Structures (DRSs), incorporate a neo-Davidsonian representations for events, using the VerbNet inventory of thematic roles. The resulting DRSs can be translated to ordinary first-order logic formulas and be processing by standard theorem provers for first-order logic.
System 3: GETARUNS
GETARUNS is a system for text understanding developed at the University of Venice, and is equipped with three main modules: a lower module for parsing where sentence strategies are implemented; a middle module for semantic interpretation and discourse model construction which is cast into Situation Semantics; and a higher module where reasoning and generation takes place.
System 4: LXGram
LXGram is a hand-built Portuguese computational grammar based on HPSG (syntax) and MRS (semantics). Because the LXGram generates many different analyses (mainly due to PP attachment ambiguities), the preferred analysis was selected manually. It was required to extend LXGram's lexicon and inventory of syntax rules to be able to get a reasonable performance on the shared task data.
System 5: OntoSem
OntoSem, which is the implementation of the theory of Ontological Semantics, is a text-processing environment that takes as input unrestricted raw text and carries out preprocessing followed by morphological, syntactic, semantic, and discourse analysis, with the results of analysis represented as a formal text-meaning representation (TMR) that can then be used as the basis for various applications.
System 6: TextCap
TextCap is a semantic parser that takes unrestricted English text, using publically available computational linguistics tools and lexical resources. It outputs produces semantic triples which can be used in a variety of tasks such as generating knowledge bases, providing raw material for question answering systems, or creating RDF structures.
System 7: Trips
This system for semantic text processing has the TRIPS parser at the core, augmented with statistical preprocessing techniques and online lexical lookup. A graphical logical form is used as a semantic representation for text understanding. This representation was designed to bridge the gap between highly expressive "deep" representations of logical forms and more shallow semantic encodings such as word senses and semantic relations. It preserves rich semantic content while allowing for compact ambiguity encoding and viable partial representations. Here is more detailed information.
Shared Task Texts
Text 1
An object is thrown with a horizontal speed of 20 m/s from a cliff that is 125 m high. The object falls for the height of the cliff. If air resistance is negligible, how long does it take the object to fall to the ground? What is the duration of the fall?
Text 2
Cervical cancer is caused by a virus. That has been known for some time and it has led to a vaccine that seems to prevent it. Researchers have been looking for other cancers that may be caused by viruses.
Text 3
John went into a restaurant. There was a table in the corner. The waiter took the order. The atmosphere was warm and friendly. He began to read his book.
Text 4
The first school for the training of leader dogs in the country is going to be created in Mortagua and will train 22 leader dogs per year. In Mortagua, Joao Pedro Fonseca and Marta Gomes coordinate the project that seven people develop in this school. They visited several similar places in England and in France, and two future trainers are already doing internship in one of the French Schools. The communitarian funding ensures the operation of the school until 1999. We would like our school to work similarly to the French ones, which live from donations, from the merchandising and even from the raffles that children sell in school.
Text 5
As the 3 guns of Turret 2 were being loaded, a crewman who was operating the center gun yelled into the phone, ``I have a problem here. I am not ready yet.'' Then the propellant exploded. When the gun crew was killed they were crouching unnaturally, which suggested that they knew that an explosion would happen. The propellant that was used was made from nitrocellulose chunks that were produced during World War II and were repackaged in 1987 in bags that were made in 1945. Initially it was suspected that this storage might have reduced the powder's stability.
Text 6
Amid the tightly packed row houses of North Philadelphia, a pioneering urban farm is providing fresh local food for a community that often lacks it, and making money in the process. Greensgrow, a one-acre plot of raised beds and greenhouses on the site of a former steel-galvanizing factory, is turning a profit by selling its own vegetables and herbs as well as a range of produce from local growers, and by running a nursery selling plants and seedlings. The farm earned about $10,000 on revenue of $450,000 in 2007, and hopes to make a profit of 5 percent on $650,000 in revenue in this, its 10th year, so it can open another operation elsewhere in Philadelphia.
Text 7
Modern development of wind-energy technology and applications was well underway by the 1930s, when an estimated 600,000 windmills supplied rural areas with electricity and water-pumping services. Once broad-scale electricity distribution spread to farms and country towns, use of wind energy in the United States started to subside, but it picked up again after the U.S. oil shortage in the early 1970s. Over the past 30 years, research and development has fluctuated with federal government interest and tax incentives. In the mid-'80s, wind turbines had a typical maximum power rating of 150 kW. In 2006, commercial, utility-scale turbines are commonly rated at over 1 MW and are available in up to 4 MW capacity.
System Output
| System Name | Authors | Affiliation | Text 1 | Text 2 | Text 3 | Text 4 | Text 5 | Text 6 | Text 7 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | BLUE | Clark, Harrison, Murray, Thompson | Boeing | sys1text1 | sys1text2 | sys1text3 | sys1text4 | sys1text5 | sys1text6 | sys1text7 |
| 2 | Boxer | Bos | University of Rome "La Sapienza" | sys2text1 | sys2text2 | sys2text3 | sys2text4 | sys2text5 | sys2text6 | sys2text7 |
| 3 | GETARUNS | Delmonte, Pianta | University of Venice "Ca' Foscari" | sys3text1 | sys3text2 | sys3text3 | sys3text4 | sys3text5 | sys3text6 | sys3text7 |
| 4 | LXGram | Branco, Costa | University of Lisbon | sys4text1 | sys4text2 | sys4text3 | sys4text4 | sys4text5 | sys4text7 | |
| 5 | OntoSem | McShane, Nirenburg, Beale | UMBC | sys5text1 | sys5text2 | sys5text3 | sys5text4 | sys5text5 | sys5text6 | sys5text7 |
| 6 | TextCap | Callaway | University of Edinburgh | sys6text1 | sys6text2 | sys6text3 | sys6text4 | sys6text5 | sys6text6 | sys6text7 |
| 7 | Trips | Allen, Swift, de Beaumont | University of Rochester | sys7text1 | sys7text2 | sys7text3 | sys7text4 | sys7text5 | sys7text6 |

