[developers] Generation with unknown words
Ann Copestake
aac10 at cam.ac.uk
Sun Feb 7 11:38:46 CET 2016
for realization at least, isn't it adequate to use a lemma list
extracted from (say) WordNet to support predicate normalisation?
But, the application that Alex is interested in is a form of
regeneration. So I think that as long as the generator accepts what the
parser outputs for unknown words, it really doesn't matter whether or
not it's normalised. I don't know whether or not anyone is using the
realiser for applications which are broad-coverage (hence need unknown
words) and where the *MRS is constructed from scratch (hence need to use
lemmas for the predicates). Excluding MT, of course.
All best,
Ann
On 07/02/2016 10:15, Stephan Oepen wrote:
> there actually are two separate mechanism to discuss: (a) lexical
> instantiation for unknown predicates (in realization) and (b)
> predicate normalization for unknown words (in parsing).
>
> as for (a), i find the current LKB mechanism about as generic as i can
> imagine (and consider appropriate). the grammar provides an inventory
> of generic lexical entries for realization (these are in part distinct
> from the parsing ones, in the ERG, because the strategies for dealing
> with inflection are different). for each such entry, the
> grammar declares which MRS predicate activates it and how to determine
> its orthography. the former is accomplished via a regular expression,
> e.g. something like /^named$/ or /^_([^_]+)/. the latter either comes
> from the (unique) parameter of the relation with the unknown predicate
> (CARG in the ERG) or from the part of the predicate matched as the
> above capture group (the lemma field). there is no provision for
> generic lexical entries with decomposed semantics (in realization).
>
> regarding (b), the ERG in parsing outputs predicates like the
> ones alex had noticed. these are not fully normalized because there
> is no reliable lemmatization facility for unknown words inside the
> parser (and, thus, generic entries for parsing predominantly are full
> forms). what is recorded in the ‘lemma’ field is the actual surface
> form, concatenated with the PoS that activated the generic entry. the
> ERG provides a mechanism for post-parsing normalization, again in
> mostly declarative and general form: triggered by regular expressions
> looking for PTB PoS tags in predicate names, an orthographemic rule of
> the grammar can (optionally) be invoked on the remainder of the
> ‘lemma’ field. if i recall correctly, we ‘disambiguate’
> lemmatization naïvely and take the first output from the set of
> matches of that rule. the resulting string is injected into a
> predicate template, e.g. something like "_~a_n_unknown_rel".
>
> i believe, at the time, i did not want to enable predicate
> normalization as part of the standard parsing set-up because of its
> heuristic (naïve disambiguation) nature. for an input of, say, ‘they
> zanned’, our current parsers have no knowledge beyond the surface form
> and its tag VBD; hence, we provide what we know as
> ‘_zanned/VBD_u_unknown’. the past tense orthographemic rule of the
> ERG will hypothesize three candidate stems (‘zanne’, ‘zann’,
> or ‘zan’). it would require more information than is in the grammar
> to do a better job of lemmatization than my current heuristic.
>
> —having refreshed my memory of the issues, i retract my suggestion to
> enable predicate normalization (in its current form) in MRS
> construction after parsing. i wish someone would work on providing a
> broader-coverage solution to this problem. but we have added an input
> fix-up transfer step to realization in the meantime, and that would
> seem like a good place for heuristic predicate normalization, for the
> time being. it would enable round-trip parsing and generation, yet
> preserve exact information in parser outputs for someone to put a
> better normalization module there.
>
> best wishes, oe
>
>
> On Sunday, February 7, 2016, Woodley Packard <sweaglesw at sweaglesw.org>
> wrote:
>
> Hello Alex,
>
> This is a corner of the generation game that is not yet
> implemented in ACE. It’s been on the ToDo list for years but
> nobody has bugged me about it so it has been sitting at low
> priority. As Stephan mentioned, the mechanism to make it work in
> the LKB is both somewhat fiddly and covered in a few cobwebs, so I
> had somewhat aloofly hoped that over the years someone would have
> straightened things out to where generation from unknown
> predicates had a canonical approach (e.g. implemented for multiple
> grammars or multiple platforms). I would be interested to hear
> whether Glenn Slayden (who is on this list) has implemented this
> in the Agree generator?
>
> I’m willing to put the hour or two it would take to make this
> work, but wonder if other DELPH-IN developers/grammarians have
> ideas about ways in which the current setup (as implemented in the
> ERG’s custom lisp code that patches into the LKB, if memory
> serves) could be improved upon in the process?
>
> Regards,
> -Woodley
>
>> On Feb 6, 2016, at 2:48 AM, Alexander Kuhnle <aok25 at cam.ac.uk> wrote:
>>
>> Dear all,
>> We came across the problem of generating from MRS involving
>> unknown words, for instance, in the sentence “I like porcelain.”
>> (parsing gives "_porcelain/NN_u_unknown_rel"). Is there an option
>> for ACE so that these cases can be handled?
>> Moreover, we came across the example “The phosphorus
>> self-combusts.” vs ?“The phosphorus is self-combusted.” Where the
>> first doesn’t parse, the second does, but doesn’t generate (again
>> presumably because of "_combusted/VBN_u_unknown_rel"). It seems
>> to not recognise verbs with a “self-“ prefix, but does for past
>> participles.
>> Many thanks,
>> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20160207/e8f037a9/attachment-0001.html>
More information about the developers
mailing list