<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Good points. <br>
<br>
In this application, as of now, we only send tags within a single
"basic" part of speech (e.g., NN.*, VB.*, JJ.*, RB.*, DT, IN).
I'd like not to be limited to choosing between NN and VBG, for
example, though.<br>
<br>
I guess we could add a feature, within or besides +TNT, for each
of the PTB tags, + or - (or perhaps, better, w/ a probability)....
How does that sound? (This will interact with the type hierarchy
that the FSC tokenizer uses is PET. Actually, maybe not: maybe a
rule per in pos.tdl?)<br>
<br>
Any further thoughts on multi-token lexemes would be most
sincerely appreciated. (I'm assuming that they would be in a
different cell/context of the chart.)<br>
<br>
This is working satisfactorily (preliminarily).<br>
<br>
Thanks MUCH!<br>
Paul<br>
<br>
<br>
On 9/18/2013 11:28 AM, Bec Dridan wrote:<br>
</div>
<blockquote
cite="mid:CAKRPO=N6a0+i-pffDVXcmh4onKiAcK-xaxrT-aGWVaYhw6RLwg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>Hi Paul,<br>
<br>
</div>
<div>People more expert in the chart mapping rules and the
grammar might want to chime in, but broadly speaking, your
rule looks like it will work. There's a couple of ways you may
run into issues:<br>
<br>
</div>
<div> * if you input multiple tags for the same token, rules get
complicated<br>
</div>
<div> * you may get unexpected results when the ERG native token
is a multi-token entry (like "for example")<br>
</div>
<div> * as I said before, sometimes the mapping between PTB and
ERG types is not what you'd expect<br>
<br>
</div>
<div>But if you are limiting the places and tags where you try
and restrict, you should be able to come up with a workable
solution this way, I think.<br>
<br>
</div>
<div>Rebecca<br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Wed, Sep 18, 2013 at 5:04 PM, Paul
Haley <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:paul@haleyai.com" target="_blank">paul@haleyai.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>Your comments have been quite helpful in getting me
headed in what appears to be the right direction...<br>
<br>
I now think default LEs (whether none or only for gaps)
has little bearing provided there is a tag provided (at
least that is what I am observing in the behavior.)<br>
<br>
I have modified lfr.tdl as below and confirmed that I no
longer get the native verbal LE for "array" provided any
of NN, NNS, NNPS, NNP (it looks like I need to send $
instead of S for two of those, though.)<br>
<br>
What do you think? Do I need a bunch of the latter?<br>
<br>
Thanks again!<br>
Paul<br>
<br>
<br>
#|
<div class="im"><br>
generic_non_ne+native_lfr := lexical_filtering_rule
&<br>
[ +CONTEXT < [ SYNSEM.PHON.ONSET con_or_voc ] >,<br>
+INPUT < [ SYNSEM.PHON.ONSET unk_onset,
ORTH.CLASS non_ne ] >,<br>
+OUTPUT < >,<br>
+POSITION "I1@C1" ].<br>
</div>
|#<br>
<br>
exclude_verbal_given_nominal_lfr :=
lexical_filtering_rule &<br>
[ +CONTEXT < [ +TNT.+TAGS < ^N.*$ > ]>,<br>
+INPUT < [ SYNSEM basic_verb_synsem ] >,
<div class="im"><br>
+OUTPUT < >,<br>
+POSITION "I1@C1" ].<br>
<br>
<br>
</div>
<div>
<div class="h5"> On 9/18/2013 10:50 AM, Bec Dridan
wrote:<br>
</div>
</div>
</div>
<div>
<div class="h5">
<blockquote type="cite">
<div dir="ltr">
<div>
<div>Hi Paul,<br>
</div>
<div><br>
DEFAULT_LES controls when we use the default
generics rather than, or possibly alongside
the native entry.<br>
The options mean, as far as I understand them:<br>
<br>
NO_DEFAULT_LES: if there is no native entry,
do nothing, ignore tags, parse will fail.<br>
DEFAULT_LES_ALL: always create a generic entry
from any input POS tags (although these can be
filtered out later)<br>
DEFAULT_LES_POSGAPS_LEXGAPS: create a generic
entry from any input POS tags only where there
was no native entry available<br>
<br>
</div>
<div>None of them have anything to do with
restricting native entries.<br>
<br>
</div>
Restricting lexical entries the way you want is
generally called supertagging, although the term
"supertag" also refers to the fact that the tags
generally used in this manner are more
fine-grained than standard POS tags.
Unfortunately, that's not in the mainstream PET
release so far, because it is not that
straightforward. There are several development
implementations around that might do what you
want, but they would all need to be configured
to your particular set up. For one thing, the
mapping from PTB tags isn't always clear-cut -
the ERG lexical entries don't always align
exactly with the PTB distinctions and so most
(all?) work has been based on restricting by
tags related to the lexical entries. As far as
I know, there's no current implementations that
can restrict by PTB POS tags, although others
might know?<br>
<br>
</div>
Rebecca<br>
<div><br>
<br>
<br>
<br>
<br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Wed, Sep 18, 2013 at
4:12 PM, Paul Haley <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:paul@haleyai.com"
target="_blank">paul@haleyai.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>I should correct my prior... <br>
<br>
It is not that the native LEs are taking
precedence, but that native LEs that are
not consistent with the input PoS are
still being added to the chart. <br>
<br>
For example, if I pass in "array" with
"NN", I'm still getting array_v1 in the
chart. I want array_n1 in the chart. So,
what I'm after is pruning the native LEs
to those that are consistent with the
input PoS (or living with the generics in
the case of no natives).<br>
<br>
Does that sound like what you called
super-tagging?<span><font color="#888888"><br>
<br>
Paul</font></span>
<div>
<div><br>
<br>
On 9/18/2013 10:04 AM, Paul Haley
wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote type="cite">
<div>I had that fear, too! Which is
why I asked.<br>
<br>
I gave it a try with no default
LEs. To my surprise, the native
lexical entries are still taking
precedence! (So I must be missing
something.)<br>
<br>
On 9/18/2013 9:42 AM, Bec Dridan
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>Hi Paul,<br>
<br>
</div>
The POS input to PET is only
designed for unknown word
handling (ie when there are no
corresponding ERG LEs, as you
noticed). It sounds like what
you are after is more like
supertagging, restricting the
lexical types used according
to some tags on the input?
I've played around a bit with
different methods to do that,
but none of them are currently
in the main branch of PET. <br>
<br>
</div>
What you propose with the
filtering rule will, I think,
force the grammar to use generic
types everywhere, rather than
use what's in the lexicon. I
very much doubt that is what you
want to do?<br>
<br>
</div>
Rebecca<br>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Wed,
Sep 18, 2013 at 3:26 PM, Paul
Haley <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:paul@haleyai.com"
target="_blank">paul@haleyai.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000"
bgcolor="#FFFFFF">
<div>Hello,<br>
<br>
I may be making some
conceptual progress on
this...<br>
<br>
I went back to the chart
mapping tutorial (<a
moz-do-not-send="true"
href="http://moin.delph-in.net/Chart_Mapping"
target="_blank">http://moin.delph-in.net/Chart_Mapping</a>)
and found myself looking
at the following lexical
filtering rule from the
ERG's lfr.tdl:<br>
<blockquote> ;; throw out
generic whenever a
native entry is
available, unless the
token is<br>
;; a named entity (which
now includes names
activated because of
mixed case or<br>
;; non-sentence-initial
capitalization).<br>
;;<br>
generic_non_ne+native_lfr
:=
lexical_filtering_rule
&<br>
[ +CONTEXT < [
SYNSEM.PHON.ONSET
con_or_voc ] >,<br>
+INPUT < [
SYNSEM.PHON.ONSET
unk_onset, ORTH.CLASS
non_ne ] >,<br>
+OUTPUT < >,<br>
+POSITION "I1@C1" ].<br>
<br>
</blockquote>
Is it the case that I want
the +CONTEXT and +INPUT to
be exactly reversed with
NO_DEFAULT_LES or
DEFAULT_LES_POSGAPS_LEXGAPS?<br>
<br>
Thank you,<br>
Paul
<div>
<div><br>
<br>
On 9/17/2013 4:54 PM,
Paul Haley wrote:<br>
</div>
</div>
</div>
<div>
<div>
<blockquote type="cite">Hi,
<br>
<br>
It seems that when I
send FSC w/ TNT tags
for some but not all
tokens I get ERG LEs
that do not satisfy
the provided tags when
using any of
NO_DEFAULT_LES,
DEFAULT_LES_ALL, or
DEFAULT_LES_POSGAPS_LEXGAPS.
It does respect these
tags when there are no
corresponding ERG LEs,
however, which is
good. <br>
<br>
Is there a way that I
can get PET w/ the ERG
to respect the TNT
tags when provided but
otherwise use the ERG
LEs? <br>
<br>
Thank you, <br>
Paul <br>
<br>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>