[developers] [erg] on generation failure with the Barcelona release and later
Dan Flickinger
danf at stanford.edu
Thu May 6 09:49:02 CEST 2010
Hi Xuchen -
Your XML example generates fine for me when I call the following:
(lkb::generate-from-mrs (mrs::read-single-mrs-xml-file "mrx.xml"))
So I'm pretty sure there is some simple parameter setting that is different in your setup and mine. Let's get together later today so we can compare settings. I'll be around all afternoon, so stop by sometime before class if you have time.
Dan
----- Original Message -----
From: "Xuchen Yao" <xuchen at coli.uni-saarland.de>
To: "Dan Flickinger" <danf at stanford.edu>, oe at ifi.uio.no
Cc: developers at delph-in.net, erg at delph-in.net
Sent: Thursday, May 6, 2010 9:25:40 AM
Subject: Re: [erg] on generation failure with the Barcelona release and later
Hi Dan and Stephan,
I confirm that using the latest LKB from LOGON repository, generation
from "We hired Grundy." and "We arrived in 1884." is successful.
However, unknown adjectives still don't work for me. For instance, I
still have the error, invalid predicates: |"_glimpy/JJ_u_unknown_rel"|
from "The glimpy glump arrived." I have appended a short example in the
end. Dan, if you are curious of what goes wrong, you can just test it
yourself (probably because of different parameter settings as you said
yesterday). But as you and Stephan agreed, normalization is a better way
to do this, I'll now switch to this direction and add some
normalization/stemming codes to my program.
Stephan, I'd love to test the upcoming LKB/ERG release using my program
and test sentences. Please just send a message to the list when they are
released and I'll follow.
Xuchen
For unknown adjectives, I tested an even shorter one: "This is iconic."
With the MRS and MRX pasted below:
LTOP: h1
INDEX: e2
RELS: <
[ GENERIC_ENTITY_REL<0:4>
LBL: h3
ARG0: x4 [ x PERS: 3 NUM: SG GEND: N ]
] [ _THIS_Q_DEM_REL<0:4>
LBL: h5
ARG0: x4 [ x PERS: 3 NUM: SG GEND: N ]
RSTR: h6
BODY: h7
] [ _iconic/JJ_u_unknown_rel<8:16>
LBL: h8
ARG0: e2 [ e SF: PROP TENSE: PRES MOOD: INDICATIVE PROG: - PERF: - ]
ARG1: x4 [ x PERS: 3 NUM: SG GEND: N ]
]
>
HCONS: < h6 qeq h3 >
MRX (you can run the following cmd to generate from this MRX:
(lkb::generate-from-mrs (mrs::read-single-mrs-xml-file "mrx.xml"))
):
<mrs><label vid="1"/><var vid="2"/><ep cfrom="0"
cto="4"><pred>GENERIC_ENTITY_REL</pred><label
vid="3"/><fvpair><rargname>ARG0</rargname><var vid="4"
sort="x"><extrapair><path>PERS</path><value>3</value></extrapair><extrapair><path>NUM</path><value>SG</value></extrapair><extrapair><path>GEND</path><value>N</value></extrapair></var></fvpair></ep><ep
cfrom="0" cto="4"><pred>_THIS_Q_DEM_REL</pred><label
vid="5"/><fvpair><rargname>ARG0</rargname><var vid="4"
sort="x"><extrapair><path>PERS</path><value>3</value></extrapair><extrapair><path>NUM</path><value>SG</value></extrapair><extrapair><path>GEND</path><value>N</value></extrapair></var></fvpair><fvpair><rargname>RSTR</rargname><var
vid="6" sort="h"/></fvpair><fvpair><rargname>BODY</rargname><var vid="7"
sort="h"/></fvpair></ep><ep cfrom="8"
cto="16"><spred>_iconic/JJ_u_unknown_rel</spred><label
vid="8"/><fvpair><rargname>ARG0</rargname><var vid="2"
sort="e"><extrapair><path>SF</path><value>PROP</value></extrapair><extrapair><path>TENSE</path><value>PRES</value></extrapair><extrapair><path>MOOD</path><value>INDICATIVE</value></extrapair><extrapair><path>PROG</path><value>-</value></extrapair><extrapair><path>PERF</path><value>-</value></extrapair></var></fvpair><fvpair><rargname>ARG1</rargname><var
vid="4"
sort="x"><extrapair><path>PERS</path><value>3</value></extrapair><extrapair><path>NUM</path><value>SG</value></extrapair><extrapair><path>GEND</path><value>N</value></extrapair></var></fvpair></ep><hcons
hreln="qeq"><hi><var vid="6" sort="h"/></hi><lo><var vid="3"
sort="h"/></lo></hcons></mrs>
The SVN version I'm using:
$ svn info
Path: .
URL: http://svn.emmtee.net/trunk
Repository Root: http://svn.emmtee.net
Repository UUID: 3df82f5b-d43a-0410-af33-fce91db48ec5
Revision: 7866
Node Kind: directory
Schedule: normal
Last Changed Author: oe
Last Changed Rev: 7857
Last Changed Date: 2010-05-03 15:52:14 +0200 (Mon, 03 May 2010)
Dan Flickinger wrote:
Hi again, Xuchen -
After looking at this a little more carefully, I agree with Stephan that
it will be better to use normalized predicate names for unknown words,
like the ones in that lkb/sample.mrs file. The 1004 version of the ERG
comes equipped with code to do that normalizing step automatically for
the kind of PET output predicate names I described earlier, so I believe
you should get good behavior now with generation for unknown words
following Stephan's advice. I've confirmed that these nicer-looking
predicates for unknown words generate fine for me, too.
But it will be important for you to use the latest LOGON release for
your experiments along with the 1004 version of the ERG, since there are
several code changes in various components that all have to work
together correctly to get the desired functionality.
Dan
----- Original Message -----
From: "Xuchen Yao" <xuchen at coli.uni-saarland.de> To: "Dan Flickinger"
<danf at stanford.edu> Cc: developers at delph-in.net , erg at delph-in.net Sent:
Wednesday, May 5, 2010 10:42:15 AM
Subject: Re: [erg] on generation failure with the Barcelona release and
later
Hi Dan,
Thanks for the reply. I was following Stephan's reply when your email
arrived. He referred me to the file lkb/sample.mrs in the 1004 release,
which is different from your explanation. I supposed LKB can generate
from sample.mrs so was trying to produce an equivalent XML to testify
it.
The difference between sample.mrs and your solution is that unknown
words take a different format. For instance, the verb "bazed" takes
"_baze_v_rel" in sample.mrs but according to your solution, it should
take _bazed/VBD_u_unknown_rel. Actually _bazed/VBD_u_unknown_rel is what
PET outputs. According to Stephan, I should do a "normalization" to this
_bazed/VBD_u_unknown_rel, I guess this normalization means to change
_bazed/VBD_u_unknown_rel to "_baze_v_rel". So it looks like two experts
have different solutions on this. I'm confused.
Also, I followed your step a little bit. I'm using ERG 1004 from the
LOGON trunk but the default LKB release from delph-in.net. Since my
program is all written in Java so I use the OpenNLP tagger rather than
TNT. Your sentence: "We hired Grundy." has the following tagging:
{"We"; POS: PRP} {"hired"; POS: VBD} {"Grundy"; POS: NNP}
So feeding an FSC xml to PET gives the following parsing:
[
SentType: PROP
Decomposer: []
LTOP: h1
INDEX: e2
RELS: <
[ PRON_REL<0:2>
LBL: h3
ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
] [ PRONOUN_Q_REL<0:2>
LBL: h5
ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
RSTR: h7
BODY: h6
] [ _hire_v_1_rel<3:8>
LBL: h8
ARG0: e2 [ e SF: PROP TENSE: PAST MOOD: INDICATIVE PROG: - PERF: - ]
ARG1: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ]
ARG2: x9 [ x PERS: 3 NUM: SG IND: + ]
] [ PROPER_Q_REL<9:17>
LBL: h10
ARG0: x9 [ x PERS: 3 NUM: SG IND: + ]
RSTR: h12
BODY: h11
] [ NAMED_REL<9:17>
LBL: h13
ARG0: x9 [ x PERS: 3 NUM: SG IND: + ]
CARG: "Grundy"
] HCONS: < h7 qeq h3 h12 qeq h13 >
]
Unluckily, LKB failed to generate: invalid predicates:
|named_rel("Grundy")|. I have a similar story of "The glimpy glump
|arrived."
You said LKB version might matter (I will try the one from LOGON
repository later). But I always thought this kind of "invalid
predicates" error only matters with the grammar rather than LKB. Please
correct me if I'm wrong.
Xuchen
Dan Flickinger wrote:
Hi Xuchen Yao -
We have made some progress on generation with unknown words since last
summer, and even though we have not yet arrived at an ideal solution,
I believe that the most recent (1004) version should work pretty well.
Here is what I just confirmed this morning:
1. Load current LKB (I'm running from the LOGON repository, which
might matter)
2. Load ERG (1004)
3. Index for generation (LKB Top -- Generate -- Index) and start the
generator server (LKB Top -- Generate -- Start server)
4. Using the `erg+tnt' CPU definition in $LOGONROOT/dot.tsdbrc, call
PET to parse a sentence containing unknown words (I parsed `The glimpy
glump arrived.')
5. Identify the MRS for the intended analysis, and generate:
- Since I'm using [incr tsdb()], I just clicked `Annotate' for this
one-sentence profile, then (left-) clicked on the analysis I wanted
(where `glimpy' is an unknown adjective), and clicked `Rephrase'
which generated the same sentence successfully.
You should know that the generator is currently expecting a very
specific format for the predicate names, following this template:
_surface-orthography/POS_u_unknown_rel where POS is one of the tags
you'll find in the generic lexical entries in erg/gle.tdl. For
example, the predicate for `glimpy' is _glimpy/jj_u_unknown_rel.
Likewise, for unknown nouns, the predicate name must be of the
following form: _glump/nn_u_unknown_rel, where the first field in the
predicate name again consists of the surface orthography followed by a
slash followed by the POS tag.
Unknown proper names are simpler: PET simply creates an ordinary
`named_rel' EP with the new proper name as the CARG value in that EP.
I confirmed that this works by parsing the following sentence with
PET: `We hired Grundy.' and then generating from the MRS for the
single analysis that PET returns.
Similarly for years like "1884", we just use the ordinary predicate
'yofc_rel', and provide the year as the CARG value. I confirmed this
with the sentence `We arrived in 1884.', which generates fine.
The main flaw in what we are currently doing is that we don't have a
good way of determining on the fly the lemma form of the unknown word
we see, so the unknown noun `glumps' gives rise to the predicate name
_glumps/nns_u_unknown_rel which is of course not ideal. We'll work on
this further, but I would in the meantime be glad to hear whether you
can get the behavior I describe above with the 1004 version of the
ERG.
Best,
Dan
----- Original Message -----
From: "Xuchen Yao" <xuchen at coli.uni-saarland.de> To:
developers at delph-in.net , erg at delph-in.net Sent: Tuesday, May 4, 2010
12:14:50 PM
Subject: [erg] on generation failure with the Barcelona release and
later
Hi,
I noticed there was some intensive discussion of generation failure
from unknown words in the mailing list last year. Then people agreed
to continue the discussion at last year's meeting but I didn't find
any memo on the delph-in website. It looks like the Barcelona (0907)
ERG release was intended for this issue. So I switched from the
current stable version (0902) to 0907 or even the newest in the trunk
(1004) hoping to have a better handle of unknown words (or the
"invalid predicates" error). But unfortunately it didn't work out.
Here's a
shortened observation from my experiment:
The basic idea is to follow what Stephan said:
"hence i think one would have to add an MRS post-processing step
before trying to feed these MRSs back into the generator." from
http://lists.delph-in.net/archive/developers/2009/001217.html 1. For
unknown NNP, I changed `named_unk_rel' to `named_rel', it works
for 0902. (If i remembered correctly, this change doesn't work for
0907 and 1004).
2. For errors like invalid predicates: |basic_yofc_rel("1998"), from
the sentence "He left in 1998." I changed basic_yofc_rel to
number_q_rel =q
card_rel as a shortcut to avoid a generation failure. This works under
0902.
3. For errors like invalid predicates: |"_iconic_jj_rel"|, from "This
is an iconic place." I tried change the *_jj_rel to
generic_unk_adj_rel with "iconic" as the CARG value. But this didn't
work under both 0902
and 0907.
I didn't observe generation failure on unknown verbs, but did have
some
cases of failure on nouns, such as: invalid predicates:
|"_wreckage_nn_rel"|, |"_oscillation_nn_rel"|, |"_axiom_nn_rel"|.
For the generation task, my naive thought is that if cheap can parse a
sentence, then LKB should generate from cheap's MRS output. For a
successful parsing, I used the chart-mapping branch of cheap to
support pre-processed (POS-tagged) sentences, but the problem of
generation failure due to invalid predicates still exists. Since there
was a
discussion on this at last year's meeting and the ERG release is
rolling forward, it looks to me this issue has already been solved
(since the
0907 release) but only I was using the wrong method. I'd appreciate it
very much if somebody can help me out. Thanks.
With kind regards,
Xuchen Yao
More information about the developers
mailing list