<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=UTF-8" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi Dan and Stephan, I confirm that using the latest LKB from LOGON repository, generation from "We hired Grundy." and "We arrived in 1884." is successful. However, unknown adjectives still don't work for me. For instance, I still have the error, invalid predicates: |"_glimpy/JJ_u_unknown_rel"| from "The glimpy glump arrived." I have appended a short example in the end. Dan, if you are curious of what goes wrong, you can just test it yourself (probably because of different parameter settings as you said yesterday). But as you and Stephan agreed, normalization is a better way to do this, I'll now switch to this direction and add some normalization/stemming codes to my program. Stephan, I'd love to test the upcoming LKB/ERG release using my program and test sentences. Please just send a message to the list when they are released and I'll follow. Xuchen For unknown adjectives, I tested an even shorter one: "This is iconic." With the MRS and MRX pasted below: LTOP: h1 INDEX: e2 RELS: < [ GENERIC_ENTITY_REL<0:4> LBL: h3 ARG0: x4 [ x PERS: 3 NUM: SG GEND: N ] ] [ _THIS_Q_DEM_REL<0:4> LBL: h5 ARG0: x4 [ x PERS: 3 NUM: SG GEND: N ] RSTR: h6 BODY: h7 ] [ _iconic/JJ_u_unknown_rel<8:16> LBL: h8 ARG0: e2 [ e SF: PROP TENSE: PRES MOOD: INDICATIVE PROG: - PERF: - ] ARG1: x4 [ x PERS: 3 NUM: SG GEND: N ] ] > HCONS: < h6 qeq h3 > MRX (you can run the following cmd to generate from this MRX: (lkb::generate-from-mrs (mrs::read-single-mrs-xml-file "mrx.xml")) ): <mrs><label vid="1"/><var vid="2"/><ep cfrom="0" cto="4"><pred>GENERIC_ENTITY_REL</pred><label vid="3"/><fvpair><rargname>ARG0</rargname><var vid="4" sort="x"><extrapair><path>PERS</path><value>3</value></extrapair><extrapair><path>NUM</path><value>SG</value></extrapair><extrapair><path>GEND</path><value>N</value></extrapair></var></fvpair></ep><ep cfrom="0" cto="4"><pred>_THIS_Q_DEM_REL</pred><label vid="5"/><fvpair><rargname>ARG0</rargname><var vid="4" sort="x"><extrapair><path>PERS</path><value>3</value></extrapair><extrapair><path>NUM</path><value>SG</value></extrapair><extrapair><path>GEND</path><value>N</value></extrapair></var></fvpair><fvpair><rargname>RSTR</rargname><var vid="6" sort="h"/></fvpair><fvpair><rargname>BODY</rargname><var vid="7" sort="h"/></fvpair></ep><ep cfrom="8" cto="16"><spred>_iconic/JJ_u_unknown_rel</spred><label vid="8"/><fvpair><rargname>ARG0</rargname><var vid="2" sort="e"><extrapair><path>SF</path><value>PROP</value></extrapair><extrapair><path>TENSE</path><value>PRES</value></extrapair><extrapair><path>MOOD</path><value>INDICATIVE</value></extrapair><extrapair><path>PROG</path><value>-</value></extrapair><extrapair><path>PERF</path><value>-</value></extrapair></var></fvpair><fvpair><rargname>ARG1</rargname><var vid="4" sort="x"><extrapair><path>PERS</path><value>3</value></extrapair><extrapair><path>NUM</path><value>SG</value></extrapair><extrapair><path>GEND</path><value>N</value></extrapair></var></fvpair></ep><hcons hreln="qeq"><hi><var vid="6" sort="h"/></hi><lo><var vid="3" sort="h"/></lo></hcons></mrs> The SVN version I'm using: $ svn info Path: . URL: <a class="moz-txt-link-freetext" href="http://svn.emmtee.net/trunk">http://svn.emmtee.net/trunk</a> Repository Root: <a class="moz-txt-link-freetext" href="http://svn.emmtee.net">http://svn.emmtee.net</a> Repository UUID: 3df82f5b-d43a-0410-af33-fce91db48ec5 Revision: 7866 Node Kind: directory Schedule: normal Last Changed Author: oe Last Changed Rev: 7857 Last Changed Date: 2010-05-03 15:52:14 +0200 (Mon, 03 May 2010) Dan Flickinger wrote: <blockquote cite="mid:475932020.48694.1273055889205.JavaMail.root@zm07.stanford.edu" type="cite"> <pre wrap="">Hi again, Xuchen - After looking at this a little more carefully, I agree with Stephan that it will be better to use normalized predicate names for unknown words, like the ones in that lkb/sample.mrs file. The 1004 version of the ERG comes equipped with code to do that normalizing step automatically for the kind of PET output predicate names I described earlier, so I believe you should get good behavior now with generation for unknown words following Stephan's advice. I've confirmed that these nicer-looking predicates for unknown words generate fine for me, too. But it will be important for you to use the latest LOGON release for your experiments along with the 1004 version of the ERG, since there are several code changes in various components that all have to work together correctly to get the desired functionality. Dan ----- Original Message ----- From: "Xuchen Yao" <a class="moz-txt-link-rfc2396E" href="mailto:xuchen@coli.uni-saarland.de"><xuchen@coli.uni-saarland.de></a> To: "Dan Flickinger" <a class="moz-txt-link-rfc2396E" href="mailto:danf@stanford.edu"><danf@stanford.edu></a> Cc: <a class="moz-txt-link-abbreviated" href="mailto:developers@delph-in.net">developers@delph-in.net</a>, <a class="moz-txt-link-abbreviated" href="mailto:erg@delph-in.net">erg@delph-in.net</a> Sent: Wednesday, May 5, 2010 10:42:15 AM Subject: Re: [erg] on generation failure with the Barcelona release and later Hi Dan, Thanks for the reply. I was following Stephan's reply when your email arrived. He referred me to the file lkb/sample.mrs in the 1004 release, which is different from your explanation. I supposed LKB can generate from sample.mrs so was trying to produce an equivalent XML to testify it. The difference between sample.mrs and your solution is that unknown words take a different format. For instance, the verb "bazed" takes "_baze_v_rel" in sample.mrs but according to your solution, it should take _bazed/VBD_u_unknown_rel. Actually _bazed/VBD_u_unknown_rel is what PET outputs. According to Stephan, I should do a "normalization" to this _bazed/VBD_u_unknown_rel, I guess this normalization means to change _bazed/VBD_u_unknown_rel to "_baze_v_rel". So it looks like two experts have different solutions on this. I'm confused. Also, I followed your step a little bit. I'm using ERG 1004 from the LOGON trunk but the default LKB release from delph-in.net. Since my program is all written in Java so I use the OpenNLP tagger rather than TNT. Your sentence: "We hired Grundy." has the following tagging: {"We"; POS: PRP} {"hired"; POS: VBD} {"Grundy"; POS: NNP} So feeding an FSC xml to PET gives the following parsing: [ SentType: PROP Decomposer: [] LTOP: h1 INDEX: e2 RELS: < [ PRON_REL<0:2> LBL: h3 ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ] ] [ PRONOUN_Q_REL<0:2> LBL: h5 ARG0: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ] RSTR: h7 BODY: h6 ] [ _hire_v_1_rel<3:8> LBL: h8 ARG0: e2 [ e SF: PROP TENSE: PAST MOOD: INDICATIVE PROG: - PERF: - ] ARG1: x4 [ x PERS: 1 NUM: PL PRONTYPE: STD_PRON ] ARG2: x9 [ x PERS: 3 NUM: SG IND: + ] ] [ PROPER_Q_REL<9:17> LBL: h10 ARG0: x9 [ x PERS: 3 NUM: SG IND: + ] RSTR: h12 BODY: h11 ] [ NAMED_REL<9:17> LBL: h13 ARG0: x9 [ x PERS: 3 NUM: SG IND: + ] CARG: "Grundy" ] </pre> <pre wrap="">HCONS: < h7 qeq h3 h12 qeq h13 > ] Unluckily, LKB failed to generate: invalid predicates: |named_rel("Grundy")|. I have a similar story of "The glimpy glump |arrived." You said LKB version might matter (I will try the one from LOGON repository later). But I always thought this kind of "invalid predicates" error only matters with the grammar rather than LKB. Please correct me if I'm wrong. Xuchen Dan Flickinger wrote: </pre> <blockquote type="cite"> <pre wrap="">Hi Xuchen Yao - We have made some progress on generation with unknown words since last summer, and even though we have not yet arrived at an ideal solution, I believe that the most recent (1004) version should work pretty well. Here is what I just confirmed this morning: 1. Load current LKB (I'm running from the LOGON repository, which might matter) 2. Load ERG (1004) 3. Index for generation (LKB Top -- Generate -- Index) and start the generator server (LKB Top -- Generate -- Start server) 4. Using the `erg+tnt' CPU definition in $LOGONROOT/dot.tsdbrc, call PET to parse a sentence containing unknown words (I parsed `The glimpy glump arrived.') 5. Identify the MRS for the intended analysis, and generate: - Since I'm using [incr tsdb()], I just clicked `Annotate' for this one-sentence profile, then (left-) clicked on the analysis I wanted (where `glimpy' is an unknown adjective), and clicked `Rephrase' which generated the same sentence successfully. You should know that the generator is currently expecting a very specific format for the predicate names, following this template: _surface-orthography/POS_u_unknown_rel where POS is one of the tags you'll find in the generic lexical entries in erg/gle.tdl. For example, the predicate for `glimpy' is _glimpy/jj_u_unknown_rel. Likewise, for unknown nouns, the predicate name must be of the following form: _glump/nn_u_unknown_rel, where the first field in the predicate name again consists of the surface orthography followed by a slash followed by the POS tag. Unknown proper names are simpler: PET simply creates an ordinary `named_rel' EP with the new proper name as the CARG value in that EP. I confirmed that this works by parsing the following sentence with PET: `We hired Grundy.' and then generating from the MRS for the single analysis that PET returns. Similarly for years like "1884", we just use the ordinary predicate 'yofc_rel', and provide the year as the CARG value. I confirmed this with the sentence `We arrived in 1884.', which generates fine. The main flaw in what we are currently doing is that we don't have a good way of determining on the fly the lemma form of the unknown word we see, so the unknown noun `glumps' gives rise to the predicate name _glumps/nns_u_unknown_rel which is of course not ideal. We'll work on this further, but I would in the meantime be glad to hear whether you can get the behavior I describe above with the 1004 version of the ERG. Best, Dan ----- Original Message ----- From: "Xuchen Yao" <a class="moz-txt-link-rfc2396E" href="mailto:xuchen@coli.uni-saarland.de"><xuchen@coli.uni-saarland.de></a> To: <a class="moz-txt-link-abbreviated" href="mailto:developers@delph-in.net">developers@delph-in.net</a>, <a class="moz-txt-link-abbreviated" href="mailto:erg@delph-in.net">erg@delph-in.net</a> Sent: Tuesday, May 4, 2010 12:14:50 PM Subject: [erg] on generation failure with the Barcelona release and later Hi, I noticed there was some intensive discussion of generation failure from unknown words in the mailing list last year. Then people agreed to continue the discussion at last year's meeting but I didn't find any memo on the delph-in website. It looks like the Barcelona (0907) ERG release was intended for this issue. So I switched from the current stable version (0902) to 0907 or even the newest in the trunk (1004) hoping to have a better handle of unknown words (or the "invalid predicates" error). But unfortunately it didn't work out. Here's a shortened observation from my experiment: The basic idea is to follow what Stephan said: "hence i think one would have to add an MRS post-processing step before trying to feed these MRSs back into the generator." from <a class="moz-txt-link-freetext" href="http://lists.delph-in.net/archive/developers/2009/001217.html">http://lists.delph-in.net/archive/developers/2009/001217.html</a> 1. For unknown NNP, I changed `named_unk_rel' to `named_rel', it works for 0902. (If i remembered correctly, this change doesn't work for 0907 and 1004). 2. For errors like invalid predicates: |basic_yofc_rel("1998"), from the sentence "He left in 1998." I changed basic_yofc_rel to number_q_rel =q card_rel as a shortcut to avoid a generation failure. This works under 0902. 3. For errors like invalid predicates: |"_iconic_jj_rel"|, from "This is an iconic place." I tried change the *_jj_rel to generic_unk_adj_rel with "iconic" as the CARG value. But this didn't work under both 0902 and 0907. I didn't observe generation failure on unknown verbs, but did have some cases of failure on nouns, such as: invalid predicates: |"_wreckage_nn_rel"|, |"_oscillation_nn_rel"|, |"_axiom_nn_rel"|. For the generation task, my naive thought is that if cheap can parse a sentence, then LKB should generate from cheap's MRS output. For a successful parsing, I used the chart-mapping branch of cheap to support pre-processed (POS-tagged) sentences, but the problem of generation failure due to invalid predicates still exists. Since there was a discussion on this at last year's meeting and the ERG release is rolling forward, it looks to me this issue has already been solved (since the 0907 release) but only I was using the wrong method. I'd appreciate it very much if somebody can help me out. Thanks. With kind regards, Xuchen Yao </pre> </blockquote> </blockquote> </body> </html>