[developers] [erg] on generation failure with the Barcelona release and later

Wed May 5 12:38:09 CEST 2010

Hi again, Xuchen -

After looking at this a little more carefully, I agree with Stephan that it will be better to use normalized predicate names for unknown words, like the ones in that lkb/sample.mrs file.  The 1004 version of the ERG comes equipped with code to do that normalizing step automatically for the kind of PET output predicate names I described earlier, so I believe you should get good behavior now with generation for unknown words following Stephan's advice.  I've confirmed that these nicer-looking predicates for unknown words generate fine for me, too.

But it will be important for you to use the latest LOGON release for your experiments along with the 1004 version of the ERG, since there are several code changes in various components that all have to work together correctly to get the desired functionality.

  Dan

----- Original Message -----
From: "Xuchen Yao" <xuchen at coli.uni-saarland.de>
To: "Dan Flickinger" <danf at stanford.edu>
Cc: developers at delph-in.net, erg at delph-in.net
Sent: Wednesday, May 5, 2010 10:42:15 AM
Subject: Re: [erg] on generation failure with the Barcelona release and later

Hi Dan,

Thanks for the reply. I was following Stephan's reply when your email
arrived. He referred me to the file lkb/sample.mrs in the 1004 release,
which is different from your explanation. I supposed LKB can generate
from sample.mrs so was trying to produce an equivalent XML to testify
it.

The difference between sample.mrs and your solution is that unknown
words take a different format. For instance, the verb "bazed" takes
"_baze_v_rel" in sample.mrs but according to your solution, it should
take _bazed/VBD_u_unknown_rel. Actually _bazed/VBD_u_unknown_rel is what
PET outputs. According to Stephan, I should do a "normalization" to this
_bazed/VBD_u_unknown_rel, I guess this normalization means to change
_bazed/VBD_u_unknown_rel to "_baze_v_rel". So it looks like two experts
have different solutions on this. I'm confused.

Also, I followed your step a little bit. I'm using ERG 1004 from the
LOGON trunk but the default LKB release from delph-in.net. Since my
program is all written in Java so I use the OpenNLP tagger rather than
TNT. Your sentence: "We hired Grundy." has the following tagging:

{"We"; POS: PRP} {"hired"; POS: VBD} {"Grundy"; POS: NNP}

So feeding an FSC xml to PET gives the following parsing:
[
SentType: PROP
Decomposer: []
LTOP: h1
INDEX: e2
RELS: <
[ PRON_REL<0:2>
LBL: h3
ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
] [ PRONOUN_Q_REL<0:2>
LBL: h5
ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
RSTR: h7
BODY: h6
] [ _hire_v_1_rel<3:8>
LBL: h8
ARG0: e2 [ e SF: PROP TENSE: PAST MOOD: INDICATIVE PROG: - PERF: - ]
ARG1: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
ARG2: x9 [ x PERS: 3 NUM: SG IND: + ]
] [ PROPER_Q_REL<9:17>
LBL: h10
ARG0: x9 [ x PERS: 3 NUM: SG IND: + ]
RSTR: h12
BODY: h11
] [ NAMED_REL<9:17>
LBL: h13
ARG0: x9 [ x PERS: 3 NUM: SG IND: + ]
CARG: "Grundy"
]
>
HCONS: < h7 qeq h3 h12 qeq h13 >
]

Unluckily, LKB failed to generate: invalid predicates:
|named_rel("Grundy")|. I have a similar story of "The glimpy glump
|arrived."

You said LKB version might matter (I will try the one from LOGON
repository later). But I always thought this kind of "invalid
predicates" error only matters with the grammar rather than LKB. Please
correct me if I'm wrong.

Xuchen

Dan Flickinger wrote:
> Hi Xuchen Yao -
>
> We have made some progress on generation with unknown words since last
> summer, and even though we have not yet arrived at an ideal solution,
> I believe that the most recent (1004) version should work pretty well.
> Here is what I just confirmed this morning:
>
> 1. Load current LKB (I'm running from the LOGON repository, which
> might matter)
> 2. Load ERG (1004)
> 3. Index for generation (LKB Top -- Generate -- Index) and start the
> generator server (LKB Top -- Generate -- Start server)
> 4. Using the `erg+tnt' CPU definition in $LOGONROOT/dot.tsdbrc, call
> PET to parse a sentence containing unknown words (I parsed `The glimpy
> glump arrived.')
> 5. Identify the MRS for the intended analysis, and generate:
>    - Since I'm using [incr tsdb()], I just clicked `Annotate' for this
>    one-sentence profile, then (left-) clicked on the analysis I wanted
>    (where `glimpy' is an unknown adjective), and clicked `Rephrase'
>    which generated the same sentence successfully.
>
> You should know that the generator is currently expecting a very
> specific format for the predicate names, following this template:
> _surface-orthography/POS_u_unknown_rel where POS is one of the tags
> you'll find in the generic lexical entries in erg/gle.tdl. For
> example, the predicate for `glimpy' is _glimpy/jj_u_unknown_rel.
> Likewise, for unknown nouns, the predicate name must be of the
> following form: _glump/nn_u_unknown_rel, where the first field in the
> predicate name again consists of the surface orthography followed by a
> slash followed by the POS tag.
>
> Unknown proper names are simpler: PET simply creates an ordinary
> `named_rel' EP with the new proper name as the CARG value in that EP.
> I confirmed that this works by parsing the following sentence with
> PET: `We hired Grundy.' and then generating from the MRS for the
> single analysis that PET returns.
>
> Similarly for years like "1884", we just use the ordinary predicate
> 'yofc_rel', and provide the year as the CARG value. I confirmed this
> with the sentence `We arrived in 1884.', which generates fine.
>
> The main flaw in what we are currently doing is that we don't have a
> good way of determining on the fly the lemma form of the unknown word
> we see, so the unknown noun `glumps' gives rise to the predicate name
> _glumps/nns_u_unknown_rel which is of course not ideal. We'll work on
> this further, but I would in the meantime be glad to hear whether you
> can get the behavior I describe above with the 1004 version of the
> ERG.
>
> Best,
>
>  Dan
>
> ----- Original Message -----
> From: "Xuchen Yao" <xuchen at coli.uni-saarland.de>
> To: developers at delph-in.net, erg at delph-in.net
> Sent: Tuesday, May 4, 2010 12:14:50 PM
> Subject: [erg] on generation failure with the Barcelona release and
> later
>
> Hi,
>
> I noticed there was some intensive discussion of generation failure
> from unknown words in the mailing list last year. Then people agreed
> to continue the discussion at last year's meeting but I didn't find
> any memo on the delph-in website. It looks like the Barcelona (0907)
> ERG release was intended for this issue. So I switched from the
> current stable version (0902) to 0907 or even the newest in the trunk
> (1004) hoping to have a better handle of unknown words (or the
> "invalid predicates" error). But unfortunately it didn't work out.
> Here's a
> shortened observation from my experiment:
>
> The basic idea is to follow what Stephan said:
>
> "hence i think one would have to add an MRS post-processing step
> before trying to feed these MRSs back into the generator." from
> http://lists.delph-in.net/archive/developers/2009/001217.html
>
> 1. For unknown NNP, I changed `named_unk_rel' to `named_rel', it works
> for 0902. (If i remembered correctly, this change doesn't work for
> 0907 and 1004).
>
> 2. For errors like invalid predicates: |basic_yofc_rel("1998"), from
> the sentence "He left in 1998." I changed basic_yofc_rel to
> number_q_rel =q
> card_rel as a shortcut to avoid a generation failure. This works under
> 0902.
>
> 3. For errors like invalid predicates: |"_iconic_jj_rel"|, from "This
> is an iconic place." I tried change the *_jj_rel to
> generic_unk_adj_rel with "iconic" as the CARG value. But this didn't
> work under both 0902
> and 0907.
>
> I didn't observe generation failure on unknown verbs, but did have
> some
> cases of failure on nouns, such as: invalid predicates:
> |"_wreckage_nn_rel"|, |"_oscillation_nn_rel"|, |"_axiom_nn_rel"|.
>
> For the generation task, my naive thought is that if cheap can parse a
> sentence, then LKB should generate from cheap's MRS output. For a
> successful parsing, I used the chart-mapping branch of cheap to
> support pre-processed (POS-tagged) sentences, but the problem of
> generation failure due to invalid predicates still exists. Since there
> was a
> discussion on this at last year's meeting and the ERG release is
> rolling forward, it looks to me this issue has already been solved
> (since the
> 0907 release) but only I was using the wrong method. I'd appreciate it
> very much if somebody can help me out. Thanks.
>
> With kind regards,
>
> Xuchen Yao
>