[developers] on generation failure with the Barcelona release and later

Tue May 4 15:18:19 CEST 2010

hei,

more details later this week, i hope.  but i'd recommend to stick to  
1004 for the ERG and the LOGON trunk; for the latter, in fact you need  
to run from source just now.

please see ‘lkb/sample.mrs’ for some supported forms of predicates  
associated to words unknown to the grammar; by and large, generator  
inputs should be normalized, i.e. not look any different from  
predicates corresponding to known words.  this aspect of supporting  
unknown words in generation i consider a solved problem.

it is however still the case that PET ouputs are not normalized, i.e.  
one should not expect to feed an MRS straight from PET into the  
generator.  someone other than the parser has to perform predicate  
normalization, as discussed last year.  this remains at best a  
partially solved problem, i would say.  the current ERG now includes a  
‘fixup’ transfer grammar, applied to generator inputs, to try and  
make the generator more robust to various types of imperfect (e.g.  
unnormalized) inputs.  however, wherever possible one should aim to  
not rely on such accomodation measures (which may or not work as  
expected in a given snapshot), but rather seek to operate with well- 
formed predicates, as exemplified in the sample MRS that comes with  
the ERG.

On 4. mai 2010, at 12.14, Xuchen Yao <xuchen at coli.uni-saarland.de>  
wrote:

> Hi,
>
> I noticed there was some intensive discussion of generation failure  
> from unknown words in the mailing list last year. Then people agreed  
> to continue the discussion at last year's meeting but I didn't find  
> any memo on the delph-in website. It looks like the Barcelona (0907)  
> ERG release was intended for this issue. So I switched from the  
> current stable version (0902) to 0907 or even the newest in the  
> trunk (1004) hoping to have a better handle of unknown words (or the  
> "invalid predicates" error). But unfortunately it didn't work out.  
> Here's a shortened observation from my experiment:
>
> The basic idea is to follow what Stephan said:
>
> "hence i think one would have to add an MRS post-processing step  
> before trying to feed these MRSs back into the generator." from http://lists.delph-in.net/archive/developers/2009/001217.html
>
> 1. For unknown NNP, I changed `named_unk_rel' to `named_rel', it  
> works for 0902. (If i remembered correctly, this change doesn't work  
> for 0907 and 1004).
>
> 2. For errors like invalid predicates: |basic_yofc_rel("1998"), from  
> the sentence "He left in 1998." I changed basic_yofc_rel to  
> number_q_rel =q card_rel as a shortcut to avoid a generation  
> failure. This works under 0902.
>
> 3. For errors like invalid predicates: |"_iconic_jj_rel"|, from  
> "This is an iconic place." I tried change the *_jj_rel to  
> generic_unk_adj_rel with "iconic" as the CARG value. But this didn't  
> work under both 0902 and 0907.
>
> I didn't observe generation failure on unknown verbs, but did have  
> some cases of failure on nouns, such as: invalid predicates:  
> |"_wreckage_nn_rel"|, |"_oscillation_nn_rel"|, |"_axiom_nn_rel"|.
>
> For the generation task, my naive thought is that if cheap can parse  
> a sentence, then LKB should generate from cheap's MRS output. For a  
> successful parsing, I used the chart-mapping branch of cheap to  
> support pre-processed (POS-tagged) sentences, but the problem of  
> generation failure due to invalid predicates still exists. Since  
> there was a discussion on this at last year's meeting and the ERG  
> release is rolling forward, it looks to me this issue has already  
> been solved (since the 0907 release) but only I was using the wrong  
> method. I'd appreciate it very much if somebody can help me out.  
> Thanks.
>
> With kind regards,
>
> Xuchen Yao
>