[developers] [erg] on generation failure with the Barcelona release and later
danf at stanford.edu
Wed May 5 09:58:57 CEST 2010
Hi Xuchen Yao -
We have made some progress on generation with unknown words since last summer, and even though we have not yet arrived at an ideal solution, I believe that the most recent (1004) version should work pretty well. Here is what I just confirmed this morning:
1. Load current LKB (I'm running from the LOGON repository, which might matter)
2. Load ERG (1004)
3. Index for generation (LKB Top -- Generate -- Index) and start the generator server (LKB Top -- Generate -- Start server)
4. Using the `erg+tnt' CPU definition in $LOGONROOT/dot.tsdbrc, call PET to parse a sentence containing unknown words (I parsed `The glimpy glump arrived.')
5. Identify the MRS for the intended analysis, and generate:
- Since I'm using [incr tsdb()], I just clicked `Annotate' for this one-sentence profile, then (left-) clicked on the analysis I wanted (where `glimpy' is an unknown adjective), and clicked `Rephrase' which generated the same sentence successfully.
You should know that the generator is currently expecting a very specific format for the predicate names, following this template:
where POS is one of the tags you'll find in the generic lexical entries in erg/gle.tdl. For example, the predicate for `glimpy' is _glimpy/jj_u_unknown_rel. Likewise, for unknown nouns, the predicate name must be of the following form: _glump/nn_u_unknown_rel, where the first field in the predicate name again consists of the surface orthography followed by a slash followed by the POS tag.
Unknown proper names are simpler: PET simply creates an ordinary `named_rel' EP with the new proper name as the CARG value in that EP. I confirmed that this works by parsing the following sentence with PET: `We hired Grundy.' and then generating from the MRS for the single analysis that PET returns.
Similarly for years like "1884", we just use the ordinary predicate 'yofc_rel', and provide the year as the CARG value. I confirmed this with the sentence `We arrived in 1884.', which generates fine.
The main flaw in what we are currently doing is that we don't have a good way of determining on the fly the lemma form of the unknown word we see, so the unknown noun `glumps' gives rise to the predicate name _glumps/nns_u_unknown_rel which is of course not ideal. We'll work on this further, but I would in the meantime be glad to hear whether you can get the behavior I describe above with the 1004 version of the ERG.
----- Original Message -----
From: "Xuchen Yao" <xuchen at coli.uni-saarland.de>
To: developers at delph-in.net, erg at delph-in.net
Sent: Tuesday, May 4, 2010 12:14:50 PM
Subject: [erg] on generation failure with the Barcelona release and later
I noticed there was some intensive discussion of generation failure from
unknown words in the mailing list last year. Then people agreed to
continue the discussion at last year's meeting but I didn't find any
memo on the delph-in website. It looks like the Barcelona (0907) ERG
release was intended for this issue. So I switched from the current
stable version (0902) to 0907 or even the newest in the trunk (1004)
hoping to have a better handle of unknown words (or the "invalid
predicates" error). But unfortunately it didn't work out. Here's a
shortened observation from my experiment:
The basic idea is to follow what Stephan said:
"hence i think one would have to add an MRS post-processing step before
trying to feed these MRSs back into the generator." from
1. For unknown NNP, I changed `named_unk_rel' to `named_rel', it works
for 0902. (If i remembered correctly, this change doesn't work for 0907
2. For errors like invalid predicates: |basic_yofc_rel("1998"), from the
sentence "He left in 1998." I changed basic_yofc_rel to number_q_rel =q
card_rel as a shortcut to avoid a generation failure. This works under
3. For errors like invalid predicates: |"_iconic_jj_rel"|, from "This is
an iconic place." I tried change the *_jj_rel to generic_unk_adj_rel
with "iconic" as the CARG value. But this didn't work under both 0902
I didn't observe generation failure on unknown verbs, but did have some
cases of failure on nouns, such as: invalid predicates:
|"_wreckage_nn_rel"|, |"_oscillation_nn_rel"|, |"_axiom_nn_rel"|.
For the generation task, my naive thought is that if cheap can parse a
sentence, then LKB should generate from cheap's MRS output. For a
successful parsing, I used the chart-mapping branch of cheap to support
pre-processed (POS-tagged) sentences, but the problem of generation
failure due to invalid predicates still exists. Since there was a
discussion on this at last year's meeting and the ERG release is rolling
forward, it looks to me this issue has already been solved (since the
0907 release) but only I was using the wrong method. I'd appreciate it
very much if somebody can help me out. Thanks.
With kind regards,
More information about the developers