[pet] passing in some but not all tags

Sun Sep 22 06:08:42 CEST 2013

G'day,

On Thu, Sep 19, 2013 at 7:20 PM, Paul Haley <paul at haleyai.com> wrote:
> Your paper is on the money.  The difference, it appears, it that blazing is
> post-chart construction.  We constrain the construction of the chart in
> several ways (with modifications to PET).

Yes.  We weren't convinced we could trust our POS tags, and so didn't
want to commit too early.

> It is definitely the case that you want to rule out types more than chose
> them given a POS tag.
>
> When disambiguating sentences over two dozen or so words long, there are
> practical issues with waiting for the parser and processing the results.
>
> In your paper, you discuss discriminatory disambiguation of thousands of
> parses.  We find that impractical!  (Presumably you process them off-line in
> advance of human interaction?)

I think [incr tsdb()] does this in real time for up to 5,000 parses
(although these days we normally only look at 500 or so), and ACE is
no doubt even faster.

> Instead, we iterate with the parser while accumulating a variety of
> constraints (e.g., parse tree structure and, now, more part of speech
> constraints).  This has a number of benefits for annotators, including
> avoiding many linguistically obscure discriminants and the non-linear
> "perplexity" they face as the number of discriminants presented increases
> (i.e., they become increasingly perplexed in an apparently super-linear
> relationship to the number of choices with which they are presented).  This
> has the added benefit of reducing NLP processing overheads significantly
> without sacrificing much in terms of accuracy of results (which pales in
> comparison versus what the grammar doesn't handle for longer sentences).

I think this is also a nice way to do things.

> One issue (or challenge) with this approach is having annotators understand
> that they want to under-constrain parsing rather than over-constrain it.
> For example, it is safer not to choose the part of speech for "interested"
> in "interested parties" but to choose the phrase itself as a noun phrase.
> In this way, anything the grammar has for the part of speech on "interested"
> will be fed back, but the verbal sense of "parties" will not be.  Thus, we
> started with a constituency structure bias.
>
> Actually, it is the semantic constraints that we strongly prefer, but
> feeding those back into the parser as input constraints seems impractical.
> (I would really like to get my hands on an end-to-end example of using PET
> for realization.)

It is hard to put constraints on the MRS, as you may not be able to
see some relations (e.g. long distance dependencies) for a while.  So
you need a way of applying only when possible.

PET doesn't do realization, but the LKB, ACE and, I think, AGREE all do.

> Our task, FYI, is translating English into formal logic (and ontology).
> Here's a link, FYI:
> http://haleyai.com/wordpress/2013/06/06/acquring-rich-logical-knowledge-from-text-semantic-technology-2013/
>
> There's a video of an early version of this discrimination at:
> http://haleyai.com/wordpress/2013/05/29/translating-english-into-logic-using-the-linguist/

I think we share a similar long term goal.

-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University