[erg] on generation failure with the Barcelona release and later

Wed May 5 10:42:15 CEST 2010

Hi Dan,

Thanks for the reply. I was following Stephan's reply when your email 
arrived. He referred me to the file lkb/sample.mrs in the 1004 release, 
which is different from your explanation. I supposed LKB can generate 
from sample.mrs so was trying to produce an equivalent XML to testify it.

The difference between sample.mrs and your solution is that unknown 
words take a different format. For instance, the verb "bazed" takes 
"_baze_v_rel" in sample.mrs but according to your solution, it should 
take _bazed/VBD_u_unknown_rel. Actually _bazed/VBD_u_unknown_rel is what 
PET outputs. According to Stephan, I should do a "normalization" to this 
_bazed/VBD_u_unknown_rel, I guess this normalization means to change 
_bazed/VBD_u_unknown_rel to "_baze_v_rel". So it looks like two experts 
have different solutions on this. I'm confused.

Also, I followed your step a little bit. I'm using ERG 1004 from the 
LOGON trunk but the default LKB release from delph-in.net. Since my 
program is all written in Java so I use the OpenNLP tagger rather than 
TNT. Your sentence: "We hired Grundy." has the following tagging:

{"We"; POS: PRP}  {"hired"; POS: VBD}  {"Grundy"; POS: NNP}

So feeding an FSC xml to PET gives the following parsing:
[
SentType: PROP
Decomposer: []
LTOP: h1
INDEX: e2
RELS: <
[ PRON_REL<0:2>
  LBL: h3
  ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
]
[ PRONOUN_Q_REL<0:2>
  LBL: h5
  ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
  RSTR: h7
  BODY: h6
]
[ _hire_v_1_rel<3:8>
  LBL: h8
  ARG0: e2 [ e SF: PROP TENSE: PAST MOOD: INDICATIVE PROG: - PERF: - ]
  ARG1: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
  ARG2: x9 [ x PERS: 3 NUM: SG IND: + ]
]
[ PROPER_Q_REL<9:17>
  LBL: h10
  ARG0: x9 [ x PERS: 3 NUM: SG IND: + ]
  RSTR: h12
  BODY: h11
]
[ NAMED_REL<9:17>
  LBL: h13
  ARG0: x9 [ x PERS: 3 NUM: SG IND: + ]
  CARG: "Grundy"
]
 >
HCONS: < h7 qeq h3 h12 qeq h13 >
]

Unluckily, LKB failed to generate: invalid predicates: 
|named_rel("Grundy")|. I have a similar story of "The glimpy glump arrived."

You said LKB version might matter (I will try the one from LOGON 
repository later). But I always thought this kind of "invalid 
predicates" error only matters with the grammar rather than LKB. Please 
correct me if I'm wrong.

Xuchen

Dan Flickinger wrote:
> Hi Xuchen Yao -
>
> We have made some progress on generation with unknown words since last summer, and even though we have not yet arrived at an ideal solution, I believe that the most recent (1004) version should work pretty well.  Here is what I just confirmed this morning:
>
> 1. Load current LKB (I'm running from the LOGON repository, which might matter)
> 2. Load ERG (1004)
> 3. Index for generation (LKB Top -- Generate -- Index) and start the generator server (LKB Top -- Generate -- Start server)
> 4. Using the `erg+tnt' CPU definition in $LOGONROOT/dot.tsdbrc, call PET to parse a sentence containing unknown words  (I parsed `The glimpy glump arrived.')
> 5. Identify the MRS for the intended analysis, and generate:
>    - Since I'm using [incr tsdb()], I just clicked `Annotate' for this one-sentence profile, then (left-) clicked on the analysis I wanted (where `glimpy' is an unknown adjective), and clicked `Rephrase' which generated the same sentence successfully.
>
> You should know that the generator is currently expecting a very specific format for the predicate names, following this template:
> _surface-orthography/POS_u_unknown_rel
> where POS is one of the tags you'll find in the generic lexical entries in erg/gle.tdl.  For example, the predicate for `glimpy' is _glimpy/jj_u_unknown_rel.  Likewise, for unknown nouns, the predicate name must be of the following form: _glump/nn_u_unknown_rel, where the first field in the predicate name again consists of the surface orthography followed by a slash followed by the POS tag.
>
> Unknown proper names are simpler: PET simply creates an ordinary `named_rel' EP with the new proper name as the CARG value in that EP.  I confirmed that this works by parsing the following sentence with PET: `We hired Grundy.' and then generating from the MRS for the single analysis that PET returns. 
>
> Similarly for years like "1884", we just use the ordinary predicate 'yofc_rel', and provide the year as the CARG value.  I confirmed this with the sentence `We arrived in 1884.', which generates fine.
>
> The main flaw in what we are currently doing is that we don't have a good way of determining on the fly the lemma form of the unknown word we see, so the unknown noun `glumps' gives rise to the predicate name _glumps/nns_u_unknown_rel which is of course not ideal.  We'll work on this further, but I would in the meantime be glad to hear whether you can get the behavior I describe above with the 1004 version of the ERG.
>
> Best,
>
>  Dan
>
> ----- Original Message -----
> From: "Xuchen Yao" <xuchen at coli.uni-saarland.de>
> To: developers at delph-in.net, erg at delph-in.net
> Sent: Tuesday, May 4, 2010 12:14:50 PM
> Subject: [erg] on generation failure with the Barcelona release and later
>
> Hi,
>
> I noticed there was some intensive discussion of generation failure from
> unknown words in the mailing list last year. Then people agreed to
> continue the discussion at last year's meeting but I didn't find any
> memo on the delph-in website. It looks like the Barcelona (0907) ERG
> release was intended for this issue. So I switched from the current
> stable version (0902) to 0907 or even the newest in the trunk (1004)
> hoping to have a better handle of unknown words (or the "invalid
> predicates" error). But unfortunately it didn't work out. Here's a
> shortened observation from my experiment:
>
> The basic idea is to follow what Stephan said:
>
> "hence i think one would have to add an MRS post-processing step before
> trying to feed these MRSs back into the generator." from
> http://lists.delph-in.net/archive/developers/2009/001217.html
>
> 1. For unknown NNP, I changed `named_unk_rel' to `named_rel', it works
> for 0902. (If i remembered correctly, this change doesn't work for 0907
> and 1004).
>
> 2. For errors like invalid predicates: |basic_yofc_rel("1998"), from the
> sentence "He left in 1998." I changed basic_yofc_rel to number_q_rel =q
> card_rel as a shortcut to avoid a generation failure. This works under
> 0902.
>
> 3. For errors like invalid predicates: |"_iconic_jj_rel"|, from "This is
> an iconic place." I tried change the *_jj_rel to generic_unk_adj_rel
> with "iconic" as the CARG value. But this didn't work under both 0902
> and 0907.
>
> I didn't observe generation failure on unknown verbs, but did have some
> cases of failure on nouns, such as: invalid predicates:
> |"_wreckage_nn_rel"|, |"_oscillation_nn_rel"|, |"_axiom_nn_rel"|.
>
> For the generation task, my naive thought is that if cheap can parse a
> sentence, then LKB should generate from cheap's MRS output. For a
> successful parsing, I used the chart-mapping branch of cheap to support
> pre-processed (POS-tagged) sentences, but the problem of generation
> failure due to invalid predicates still exists. Since there was a
> discussion on this at last year's meeting and the ERG release is rolling
> forward, it looks to me this issue has already been solved (since the
> 0907 release) but only I was using the wrong method. I'd appreciate it
> very much if somebody can help me out. Thanks.
>
> With kind regards,
>
> Xuchen Yao
>