[developers] [erg] on generation failure with the Barcelona release and later

Wed May 5 11:18:19 CEST 2010

as of today, there is some generation-related new functionality that  
the forthcoming 1004 release of the ERG uses, which is only available  
when running from source in the LOGON trunk.  as we finalize the  
release, i expect to merge code with the LKB trunk (and re-generate  
run-time binaries) in the next week or two.  as unknown word handling  
is in large parts in the LKB code (and something that has evolved in  
the past two years, for generation), it very much matters to align ERG  
and LKB revisions.

cheers, oe

On 5. mai 2010, at 10.42, Xuchen Yao <xuchen at coli.uni-saarland.de>  
wrote:

> Hi Dan,
>
> Thanks for the reply. I was following Stephan's reply when your  
> email arrived. He referred me to the file lkb/sample.mrs in the 1004  
> release, which is different from your explanation. I supposed LKB  
> can generate from sample.mrs so was trying to produce an equivalent  
> XML to testify it.
>
> The difference between sample.mrs and your solution is that unknown  
> words take a different format. For instance, the verb "bazed" takes  
> "_baze_v_rel" in sample.mrs but according to your solution, it  
> should take _bazed/VBD_u_unknown_rel. Actually _bazed/ 
> VBD_u_unknown_rel is what PET outputs. According to Stephan, I  
> should do a "normalization" to this _bazed/VBD_u_unknown_rel, I  
> guess this normalization means to change _bazed/VBD_u_unknown_rel to  
> "_baze_v_rel". So it looks like two experts have different solutions  
> on this. I'm confused.
>
> Also, I followed your step a little bit. I'm using ERG 1004 from the  
> LOGON trunk but the default LKB release from delph-in.net. Since my  
> program is all written in Java so I use the OpenNLP tagger rather  
> than TNT. Your sentence: "We hired Grundy." has the following tagging:
>
> {"We"; POS: PRP}  {"hired"; POS: VBD}  {"Grundy"; POS: NNP}
>
> So feeding an FSC xml to PET gives the following parsing:
> [
> SentType: PROP
> Decomposer: []
> LTOP: h1
> INDEX: e2
> RELS: <
> [ PRON_REL<0:2>
> LBL: h3
> ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
> ]
> [ PRONOUN_Q_REL<0:2>
> LBL: h5
> ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
> RSTR: h7
> BODY: h6
> ]
> [ _hire_v_1_rel<3:8>
> LBL: h8
> ARG0: e2 [ e SF: PROP TENSE: PAST MOOD: INDICATIVE PROG: - PERF: - ]
> ARG1: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
> ARG2: x9 [ x PERS: 3 NUM: SG IND: + ]
> ]
> [ PROPER_Q_REL<9:17>
> LBL: h10
> ARG0: x9 [ x PERS: 3 NUM: SG IND: + ]
> RSTR: h12
> BODY: h11
> ]
> [ NAMED_REL<9:17>
> LBL: h13
> ARG0: x9 [ x PERS: 3 NUM: SG IND: + ]
> CARG: "Grundy"
> ]
> >
> HCONS: < h7 qeq h3 h12 qeq h13 >
> ]
>
> Unluckily, LKB failed to generate: invalid predicates: |named_rel 
> ("Grundy")|. I have a similar story of "The glimpy glump arrived."
>
> You said LKB version might matter (I will try the one from LOGON  
> repository later). But I always thought this kind of "invalid  
> predicates" error only matters with the grammar rather than LKB.  
> Please correct me if I'm wrong.
>
>
> Xuchen
>
>
> Dan Flickinger wrote:
>> Hi Xuchen Yao -
>>
>> We have made some progress on generation with unknown words since  
>> last summer, and even though we have not yet arrived at an ideal  
>> solution, I believe that the most recent (1004) version should work  
>> pretty well.  Here is what I just confirmed this morning:
>>
>> 1. Load current LKB (I'm running from the LOGON repository, which  
>> might matter)
>> 2. Load ERG (1004)
>> 3. Index for generation (LKB Top -- Generate -- Index) and start  
>> the generator server (LKB Top -- Generate -- Start server)
>> 4. Using the `erg+tnt' CPU definition in $LOGONROOT/dot.tsdbrc,  
>> call PET to parse a sentence containing unknown words  (I parsed  
>> `The glimpy glump arrived.')
>> 5. Identify the MRS for the intended analysis, and generate:
>>   - Since I'm using [incr tsdb()], I just clicked `Annotate' for  
>> this one-sentence profile, then (left-) clicked on the analysis I  
>> wanted (where `glimpy' is an unknown adjective), and clicked  
>> `Rephrase' which generated the same sentence successfully.
>>
>> You should know that the generator is currently expecting a very  
>> specific format for the predicate names, following this template:
>> _surface-orthography/POS_u_unknown_rel
>> where POS is one of the tags you'll find in the generic lexical  
>> entries in erg/gle.tdl.  For example, the predicate for `glimpy' is  
>> _glimpy/jj_u_unknown_rel.  Likewise, for unknown nouns, the  
>> predicate name must be of the following form: _glump/ 
>> nn_u_unknown_rel, where the first field in the predicate name again  
>> consists of the surface orthography followed by a slash followed by  
>> the POS tag.
>>
>> Unknown proper names are simpler: PET simply creates an ordinary  
>> `named_rel' EP with the new proper name as the CARG value in that  
>> EP.  I confirmed that this works by parsing the following sentence  
>> with PET: `We hired Grundy.' and then generating from the MRS for  
>> the single analysis that PET returns.
>> Similarly for years like "1884", we just use the ordinary predicate  
>> 'yofc_rel', and provide the year as the CARG value.  I confirmed  
>> this with the sentence `We arrived in 1884.', which generates fine.
>>
>> The main flaw in what we are currently doing is that we don't have  
>> a good way of determining on the fly the lemma form of the unknown  
>> word we see, so the unknown noun `glumps' gives rise to the  
>> predicate name _glumps/nns_u_unknown_rel which is of course not  
>> ideal.  We'll work on this further, but I would in the meantime be  
>> glad to hear whether you can get the behavior I describe above with  
>> the 1004 version of the ERG.
>>
>> Best,
>>
>> Dan
>>
>> ----- Original Message -----
>> From: "Xuchen Yao" <xuchen at coli.uni-saarland.de>
>> To: developers at delph-in.net, erg at delph-in.net
>> Sent: Tuesday, May 4, 2010 12:14:50 PM
>> Subject: [erg] on generation failure with the Barcelona release and  
>> later
>>
>> Hi,
>>
>> I noticed there was some intensive discussion of generation failure  
>> from
>> unknown words in the mailing list last year. Then people agreed to
>> continue the discussion at last year's meeting but I didn't find any
>> memo on the delph-in website. It looks like the Barcelona (0907) ERG
>> release was intended for this issue. So I switched from the current
>> stable version (0902) to 0907 or even the newest in the trunk (1004)
>> hoping to have a better handle of unknown words (or the "invalid
>> predicates" error). But unfortunately it didn't work out. Here's a
>> shortened observation from my experiment:
>>
>> The basic idea is to follow what Stephan said:
>>
>> "hence i think one would have to add an MRS post-processing step  
>> before
>> trying to feed these MRSs back into the generator." from
>> http://lists.delph-in.net/archive/developers/2009/001217.html
>>
>> 1. For unknown NNP, I changed `named_unk_rel' to `named_rel', it  
>> works
>> for 0902. (If i remembered correctly, this change doesn't work for  
>> 0907
>> and 1004).
>>
>> 2. For errors like invalid predicates: |basic_yofc_rel("1998"),  
>> from the
>> sentence "He left in 1998." I changed basic_yofc_rel to  
>> number_q_rel =q
>> card_rel as a shortcut to avoid a generation failure. This works  
>> under
>> 0902.
>>
>> 3. For errors like invalid predicates: |"_iconic_jj_rel"|, from  
>> "This is
>> an iconic place." I tried change the *_jj_rel to generic_unk_adj_rel
>> with "iconic" as the CARG value. But this didn't work under both 0902
>> and 0907.
>>
>> I didn't observe generation failure on unknown verbs, but did have  
>> some
>> cases of failure on nouns, such as: invalid predicates:
>> |"_wreckage_nn_rel"|, |"_oscillation_nn_rel"|, |"_axiom_nn_rel"|.
>>
>> For the generation task, my naive thought is that if cheap can  
>> parse a
>> sentence, then LKB should generate from cheap's MRS output. For a
>> successful parsing, I used the chart-mapping branch of cheap to  
>> support
>> pre-processed (POS-tagged) sentences, but the problem of generation
>> failure due to invalid predicates still exists. Since there was a
>> discussion on this at last year's meeting and the ERG release is  
>> rolling
>> forward, it looks to me this issue has already been solved (since the
>> 0907 release) but only I was using the wrong method. I'd appreciate  
>> it
>> very much if somebody can help me out. Thanks.
>>
>> With kind regards,
>>
>> Xuchen Yao
>>