[developers] Inter-annotator agreement for Redwoods treebanks

Dan Flickinger danf at stanford.edu
Tue Oct 15 19:38:06 CEST 2013

I think this Tanaka et al (2005) paper describes the nearest to an accepted set of measures for inter-annotator agreement with our treebanks, computing a harmonic mean of precision across labeled constituents, rather than measuring agreement of discriminants.  Here is a link to that paper:
MacKinlay et al (2010) adopts this approach as well, in this paper:

The problem with measuring agreement on discriminants is that there is a lot of redundancy in the list of discriminants presented to the annotator, and two annotators can arrive at exactly the same tree by making almost completely disjoint manual discriminant choices, for example with one annotator working "top-down" (initially making spanning-level choices of phrasal constructions) and the other working "bottom-up" (initially making lexical disambiguation choices).


----- Original Message -----
From: "Emily M. Bender" <ebender at uw.edu>
To: "developers" <developers at delph-in.net>
Sent: Tuesday, October 15, 2013 9:56:02 AM
Subject: [developers] Inter-annotator agreement for Redwoods treebanks

Hi all,

Is there an accepted, chance-corrected measure for inter-annotator
agreement with Redwoods treebanks?  It seems to me that measuring
chance agreement over discriminants would make more sense than
measuring over trees, and I'm not quite sure how to conceptualize the
chance agreement for the "reject all trees" option...


Emily M. Bender
Associate Professor
Department of Linguistics
Check out CLMS on facebook! http://www.facebook.com/uwclma

More information about the developers mailing list