[sdp] large and multi-lingual collections of semantic dependency graphs

Stephan Oepen oe at ifi.uio.no
Sat Jun 25 00:07:12 CEST 2016


dear colleagues,

we are happy to announce the general availability of a large and
carefully curated collection of target representations for Semantic
Dependency Parsing (SDP), which have previously been used in
connection with the 2014 and 2015 Semantic Evaluation Exercises
(SemEval).  these representations take the form of bi-lexical semantic
dependency graphs, where nodes are comprised of surface tokens, and
binary, asymmetric dependency edges encode predicate–argument
structure.

unlike common target representations in syntactic dependency parsing,
the SDP graphs relax standard structural constraints on syntax trees,
i.e. they need not be singly-rooted, allow re-entrancies (argument
sharing across predicates) and crossing edges, and can leave
semantically vacuous tokens unconnected.  at the same time, these
graphs are less partial than typical target representations in
semantic role labeling, providing predicate–argument relations for all
content words.

the SDP semantic dependency graphs are grounded in formal linguistic
theory, viz. Combinatory Categorial Grammar (CCG), Functional
Generative Description (FGD), and Head-Driven Phrase Grammar (HPSG).
for English, SDP provides four parallel (sentence- and token-aligned)
annotations for some 900,000 tokens of running text from the venerable
WSJ and Brown corpora.  comparable data volumes are available for
Chinese and Czech, albeit in only one target representation each.

for general background on the SDP dependency graphs, results from the
earlier SemEval tasks, and access details, please see the following
pages (and summary papers linked there):

  http://sdp.delph-in.net/

the LDC has just published a public re-release of the original SDP
2014 and 2015 data (including all ‘companion’ data, the official
scorers, and all system submissions received).  also in this new
package, we have added a fourth target representation—dubbed CCD—which
seeks to make available a canonical version of the conversion from
CCGbank files to bi-lexical dependency graphs.  for the complete LDC
release of the SDP 2016 package, please see:

  https://catalog.ldc.upenn.edu/LDC2016T10

a sub-set of the SDP target representations is not derivative of LDC
annotations and is thus available under a Creative Commons licensing
scheme for direct download.  further information on the contents of
the Open SDP sub-set are available on the following page:

 http://sdp.delph-in.net/index.php?page=5

we hope to stimulate continued research interest in the SDP parsing
problem (and, ideally, more cross-framework comparison of the various
graph representations).  in case you would like to use the SDP target
representations in your own work (syntactico-semantic parsing into
graph-structured target representations or linguistic comparison
across different schools of thought), or if you have suggestions for
improving or correcting the above web pages and data packages, we will
be delighted to hear from you.

best wishes, oe (for the SDP task organizers)

dan flickinger
jan hajič
angelina ivanova
marco kuhlmann
yusuke miyao
stephan oepen
daniel zeman



More information about the sdp-users mailing list