[sdp] Call for Participation: Cross-Framework Meaning Representation Parsing at CoNLL 2019

Stephan Oepen oe at ifi.uio.no
Fri Apr 26 12:03:27 CEST 2019

[with apologies for cross-posting]

We are excited to invite participants to the Shared Task at the 2019
Conference on Computational Natural Language Learning (CoNLL):

  Cross-Framework Meaning Representation Parsing (MRP 2019)

For background on the nature of the task and its schedule, please see:


Any potentially interested parties, please sign up for future updates:


A sample of sentences annotated with semantic graphs in all frameworks:



The goal of the task is to advance data-driven parsing into
graph-structured representations of sentence meaning.  All things
semantic are receiving heightened attention in recent years.  And
despite remarkable advances in vector-based (continuous and
distributed) encodings of meaning, ‘classic’ (discrete and
hierarchically structured) semantic representations will continue to
play an important role in ‘making sense’ of natural language.  While
parsing has long been dominated by tree-structured target
representations, there is now growing interest in general graphs as
more expressive and arguably more adequate target structures.

For the first time, this task combines formally and linguistically
different approaches to meaning representation in graph form in a
uniform training and evaluation setup.  Participants are invited to
develop parsing systems that support five distinct semantic graph
frameworks—which all encode core predicate–argument structure, among
other things—in the same implementation.  Training and evaluation data
will be provided for all five frameworks.  Participants are asked to
design and train a system that predicts sentence-level meaning
representations in all frameworks in parallel.  Architectures that
utilize complementary knowledge sources (e.g. via parameter sharing)
are encouraged (though not required).  Learning from multiple flavors
of meaning representation in tandem has hardly been explored.

The task seeks to reduce framework-specific ‘balkanization’ in the
field of meaning representation parsing.  Expected outcomes include
(a) a unifying formal model over different semantic graph banks, (b)
uniform representations and scoring, (c) systematic contrastive
evaluation across frameworks, and (d) increased cross-fertilization
via transfer and multi-task learning.  We hope to engage the combined
community of parser developers for graph-structured output
representations, including from six prior framework-specific tasks at
the Semantic Evaluation exercises between 2014 and 2019.  Owing to
scarcity of semantic annotations across frameworks, the shared task is
regrettably limited to parsing English for the time being.


The task combines five frameworks for graph-based meaning
representation, each with its specific formal and linguistic

+ DELPH-IN MRS Bi-Lexical Dependencies (Ivanova et al., 2012)
+ Prague Semantic Dependencies (Hajič et al., 2012)
+ Elementary Dependency Structures (Oepen & Lønning, 2006)
+ Universal Conceptual Cognitive Annotation (Abend & Rappoport, 2013)
+ Abstract Meaning Representation (Banarescu et al., 2013)

For the shared task, we have for the first time repackaged five graph
banks into a uniform and normalized abstract representation with a
common serialization format (in JSON).  Training data comprising
semantic graphs over a total of some 3.5 million tokens in running
English text is already available to participants.  For all
frameworks, both in- and out-of-domain evaluation data will be
provided in the same unified format.


+ March 25, 2019: Availability of Sample Training Graphs
+ April 15, 2019: Initial Release of Training Data
+ May 20, 2019: Data Updates and Syntactic Companions
+ July 8–22, 2019: Evaluation Period (Held-Out Data)
+ September 2, 2019: Submission of System Descriptions
+ September 30, 2019: Camera-Ready Manuscripts
+ November 3–4, 2019: Presentation of Results at CoNLL


For each of the individual frameworks, there are common ways of
evaluating the quality of parser outputs in terms of graph similarity
to gold-standard target representations.  There is broad similarity
between the framework-specific evaluation metrics used to date,
although there are some subtle differences too.  In a nutshell,
meaning representation parsing is commonly evaluated in terms of a
graph similarity F1 score at the level of individual node–edge–node
triples, i.e. ‘atomic’ dependencies.

For the shared task, we will implement a (straightforward)
generalization of existing, framework-specific metrics that is (a)
applicable across different flavors of semantic graphs, (b) provides a
labeled and unlabeled variant, (c) does not require matching node
anchoring in the underlying string, but (d) takes advantage of node
ordering when available. Labeled per-dependency scores, macro-averaged
across all frameworks, will be the official metric for the task; but
we will also provide additional cross-framework evaluation
perspectives, as well as scoring in established framework-specific


We invite all possibly interested parties to self-subscribe to the
mailing list for this task; the subscription link and access
information for the training data are available from the task web


Please do not hesitate to contact the task organizers for questions or
clarifications, using the joint email address provided on the task web

Omri Abend, Jan Hajič, Daniel Hershcovich, Marco Kuhlmann,
Stephan Oepen (chair), Tim O'Gorman, and Nianwen Xue

More information about the sdp-users mailing list