<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Would the morpha and morphg tools at <a href="http://users.sussex.ac.uk/~johnca/morph.html">http://users.sussex.ac.uk/~johnca/morph.html</a> be appropriate for predicate normalisation for parsing and generation? They are inverses of each other, i.e.<div><br></div><div><div>$ echo "zanned_VBD" | ./morpha.ix86_darwin -actf verbstem.list</div><div>zann+ed_VBD<br>$ echo zann+ed_VBD | ./morphg.ix86_darwin -ctf verbstem.list<br>zanned_VBD<br><br></div><div><br></div><div>John</div><div><br><div><div>On 7 Feb 2016, at 10:38, Ann Copestake wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div bgcolor="#FFFFFF" text="#000000">
for realization at least, isn't it adequate to use a lemma list
extracted from (say) WordNet to support predicate normalisation? <br>
<br>
But, the application that Alex is interested in is a form of
regeneration. So I think that as long as the generator accepts what
the parser outputs for unknown words, it really doesn't matter
whether or not it's normalised. I don't know whether or not anyone
is using the realiser for applications which are broad-coverage
(hence need unknown words) and where the *MRS is constructed from
scratch (hence need to use lemmas for the predicates). Excluding
MT, of course.<br>
<br>
All best,<br>
<br>
Ann<br>
<br>
<div class="moz-cite-prefix">On 07/02/2016 10:15, Stephan Oepen
wrote:<br>
</div>
<blockquote cite="mid:CA+_Fm6JCwB9Rb9W7yaA97ccU=gJzJFGEjqvfEP1Nv=H05vaUKw@mail.gmail.com" type="cite">there actually are two separate mechanism to discuss:
(a) lexical instantiation for unknown predicates (in
realization) and (b) predicate normalization for unknown words (in
parsing).
<div><br>
</div>
<div>as for (a), i find the current LKB mechanism about as generic
as i can imagine (and consider appropriate). the grammar
provides an inventory of generic lexical entries for realization
(these are in part distinct from the parsing ones, in the ERG,
because the strategies for dealing with inflection are
different). for each such entry, the grammar declares which MRS
predicate activates it and how to determine its orthography.
the former is accomplished via a regular expression, e.g.
something like /^named$/ or /^_([^_]+)/. the latter either
comes from the (unique) parameter of the relation with the
unknown predicate (CARG in the ERG) or from the part of the
predicate matched as the above capture group (the lemma field).
there is no provision for generic lexical entries with
decomposed semantics (in realization).</div>
<div><br>
</div>
<div>regarding (b), the ERG in parsing outputs predicates like the
ones alex had noticed. these are not fully normalized because
there is no reliable lemmatization facility for unknown
words inside the parser (and, thus, generic entries for parsing
predominantly are full forms). what is recorded in the ‘lemma’
field is the actual surface form, concatenated with the PoS that
activated the generic entry. the ERG provides a mechanism for
post-parsing normalization, again in mostly declarative and
general form: triggered by regular expressions looking
for PTB PoS tags in predicate names, an orthographemic rule of
the grammar can (optionally) be invoked on the remainder of the
‘lemma’ field. if i recall correctly, we ‘disambiguate’
lemmatization naïvely and take the first output from the set of
matches of that rule. the resulting string is injected into a
predicate template, e.g. something like "_~a_n_unknown_rel".</div>
<div><br>
</div>
<div>i believe, at the time,<span></span> i did not want to enable
predicate normalization as part of the standard parsing set-up
because of its heuristic (naïve disambiguation) nature. for an
input of, say, ‘they zanned’, our current parsers have no
knowledge beyond the surface form and its tag VBD; hence, we
provide what we know as ‘_zanned/VBD_u_unknown’. the past tense
orthographemic rule of the ERG will hypothesize three candidate
stems (‘zanne’, ‘zann’, or ‘zan’). it would require more
information than is in the grammar to do a better job of
lemmatization than my current heuristic.</div>
<div><br>
</div>
<div>—having refreshed my memory of the issues, i retract my
suggestion to enable predicate normalization (in its current
form) in MRS construction after parsing. i wish someone would
work on providing a broader-coverage solution to this problem.
but we have added an input fix-up transfer step to realization
in the meantime<span></span>, and that would seem like a good
place for heuristic predicate normalization, for the time being.
it would enable round-trip parsing and generation, yet preserve
exact information in parser outputs for someone to put a better
normalization module there.</div>
<div><br>
</div>
<div>best wishes, oe</div>
<div><br>
</div>
<div><br>
On Sunday, February 7, 2016, Woodley Packard <<a moz-do-not-send="true"></a><a class="moz-txt-link-abbreviated" href="mailto:sweaglesw@sweaglesw.org">sweaglesw@sweaglesw.org</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">Hello Alex,
<div><br>
</div>
<div>This is a corner of the generation game that is not yet
implemented in ACE. It’s been on the ToDo list for years
but nobody has bugged me about it so it has been sitting
at low priority. As Stephan mentioned, the mechanism to
make it work in the LKB is both somewhat fiddly and
covered in a few cobwebs, so I had somewhat aloofly hoped
that over the years someone would have straightened things
out to where generation from unknown predicates had a
canonical approach (e.g. implemented for multiple grammars
or multiple platforms). I would be interested to hear
whether Glenn Slayden (who is on this list) has
implemented this in the Agree generator?</div>
<div><br>
</div>
<div>I’m willing to put the hour or two it would take to
make this work, but wonder if other DELPH-IN
developers/grammarians have ideas about ways in which the
current setup (as implemented in the ERG’s custom lisp
code that patches into the LKB, if memory serves) could be
improved upon in the process?</div>
<div><br>
</div>
<div>Regards,</div>
<div>-Woodley</div>
<div><br>
<div>
<blockquote type="cite">
<div>On Feb 6, 2016, at 2:48 AM, Alexander Kuhnle <<a moz-do-not-send="true"></a><a class="moz-txt-link-abbreviated" href="mailto:aok25@cam.ac.uk">aok25@cam.ac.uk</a>>
wrote:</div>
<br>
<div>
<div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<div style="margin:0cm 0cm
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">Dear
all,</div>
<div style="margin:0cm 0cm
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif"> </div>
<div style="margin:0cm 0cm
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">We
came across the problem of generating from MRS
involving unknown words, for instance, in the
sentence “I like porcelain.” (parsing gives
"_porcelain/NN_u_unknown_rel"). Is there an
option for ACE so that these cases can be
handled?</div>
<div style="margin:0cm 0cm
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">Moreover,
we came across the example “The phosphorus
self-combusts.” vs ?“The phosphorus is
self-combusted.” Where the first doesn’t parse,
the second does, but doesn’t generate (again
presumably because of
"_combusted/VBN_u_unknown_rel"). It seems to not
recognise verbs with a “self-“ prefix, but does
for past participles.</div>
<div style="margin:0cm 0cm
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif"> </div>
<div style="margin:0cm 0cm
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">Many
thanks,</div>
<div style="margin:0cm 0cm
0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">Alex</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
</div>
</blockquote>
<br>
</div>
</blockquote></div><br></div></div></body></html>