[pet] passing in some but not all tags

Paul Haley paul at haleyai.com
Wed Sep 18 17:41:18 CEST 2013


Good points.

In this application, as of now, we only send tags within a single 
"basic" part of speech (e.g., NN.*, VB.*, JJ.*, RB.*, DT, IN). I'd like 
not to be limited to choosing between NN and VBG, for example, though.

I guess we could add a feature, within or besides +TNT, for each of the 
PTB tags, + or - (or perhaps, better, w/ a probability).... How does 
that sound?  (This will interact with the type hierarchy that the FSC 
tokenizer uses is PET.  Actually, maybe not: maybe a rule per in pos.tdl?)

Any further thoughts on multi-token lexemes would be most sincerely 
appreciated.  (I'm assuming that they would be in a different 
cell/context of the chart.)

This is working satisfactorily (preliminarily).

Thanks MUCH!
Paul


On 9/18/2013 11:28 AM, Bec Dridan wrote:
> Hi Paul,
>
> People more expert in the chart mapping rules and the grammar might 
> want to chime in, but broadly speaking, your rule looks like it will 
> work. There's a couple of ways you may run into issues:
>
>  * if you input multiple tags for the same token, rules get complicated
>  * you may get unexpected results when the ERG native token is a 
> multi-token entry (like "for example")
>  * as I said before, sometimes the mapping between PTB and ERG types 
> is not what you'd expect
>
> But if you are limiting the places and tags where you try and 
> restrict, you should be able to come up with a workable solution this 
> way, I think.
>
> Rebecca
>
>
>
> On Wed, Sep 18, 2013 at 5:04 PM, Paul Haley <paul at haleyai.com 
> <mailto:paul at haleyai.com>> wrote:
>
>     Your comments have been quite helpful in getting me headed in what
>     appears to be the right direction...
>
>     I now think default LEs (whether none or only for gaps) has little
>     bearing provided there is a tag provided (at least that is what I
>     am observing in the behavior.)
>
>     I have modified lfr.tdl as below and confirmed that I no longer
>     get the native verbal LE for "array" provided any of NN, NNS,
>     NNPS, NNP (it looks like I need to send $ instead of S for two of
>     those, though.)
>
>     What do you think?  Do I need a bunch of the latter?
>
>     Thanks again!
>     Paul
>
>
>     #|
>
>     generic_non_ne+native_lfr := lexical_filtering_rule &
>     [ +CONTEXT < [ SYNSEM.PHON.ONSET con_or_voc ] >,
>       +INPUT < [ SYNSEM.PHON.ONSET unk_onset, ORTH.CLASS non_ne ] >,
>       +OUTPUT < >,
>       +POSITION "I1 at C1" ].
>     |#
>
>     exclude_verbal_given_nominal_lfr := lexical_filtering_rule &
>     [ +CONTEXT < [ +TNT.+TAGS < ^N.*$ > ]>,
>       +INPUT < [ SYNSEM basic_verb_synsem ] >,
>
>       +OUTPUT < >,
>       +POSITION "I1 at C1" ].
>
>
>     On 9/18/2013 10:50 AM, Bec Dridan wrote:
>>     Hi Paul,
>>
>>     DEFAULT_LES controls when we use the default generics rather
>>     than, or possibly alongside the native entry.
>>     The options mean, as far as I understand them:
>>
>>     NO_DEFAULT_LES: if there is no native entry, do nothing, ignore
>>     tags, parse will fail.
>>     DEFAULT_LES_ALL: always create a generic entry from any input POS
>>     tags (although these can be filtered out later)
>>     DEFAULT_LES_POSGAPS_LEXGAPS: create a generic entry from any
>>     input POS tags only where there was no native entry available
>>
>>     None of them have anything to do with restricting native entries.
>>
>>     Restricting lexical entries the way you want is generally called
>>     supertagging, although the term "supertag" also refers to the
>>     fact that the tags generally used in this manner are more
>>     fine-grained than standard POS tags. Unfortunately, that's not in
>>     the mainstream PET release so far, because it is not that
>>     straightforward. There are several development implementations
>>     around that might do what you want, but they would all need to be
>>     configured to your particular set up. For one thing, the mapping
>>     from PTB tags isn't always clear-cut - the ERG lexical entries
>>     don't always align exactly with the PTB distinctions and so most
>>     (all?) work has been based on restricting by tags related to the
>>     lexical entries.  As far as I know, there's no current
>>     implementations that can restrict by PTB POS tags, although
>>     others might know?
>>
>>     Rebecca
>>
>>
>>
>>
>>
>>
>>
>>     On Wed, Sep 18, 2013 at 4:12 PM, Paul Haley <paul at haleyai.com
>>     <mailto:paul at haleyai.com>> wrote:
>>
>>         I should correct my prior...
>>
>>         It is not that the native LEs are taking precedence, but that
>>         native LEs that are not consistent with the input PoS are
>>         still being added to the chart.
>>
>>         For example, if I pass in "array" with "NN", I'm still
>>         getting array_v1 in the chart.  I want array_n1 in the
>>         chart.  So, what I'm after is pruning the native LEs to those
>>         that are consistent with the input PoS (or living with the
>>         generics in the case of no natives).
>>
>>         Does that sound like what you called super-tagging?
>>
>>         Paul
>>
>>
>>         On 9/18/2013 10:04 AM, Paul Haley wrote:
>>>         I had that fear, too!  Which is why I asked.
>>>
>>>         I gave it a try with no default LEs.  To my surprise, the
>>>         native lexical entries are still taking precedence!  (So I
>>>         must be missing something.)
>>>
>>>         On 9/18/2013 9:42 AM, Bec Dridan wrote:
>>>>         Hi Paul,
>>>>
>>>>         The POS input to PET is only designed for unknown word
>>>>         handling (ie when there are no corresponding ERG LEs, as
>>>>         you noticed).  It sounds like what you are after is more
>>>>         like supertagging, restricting the lexical types used
>>>>         according to some tags on the input? I've played around a
>>>>         bit with different methods to do that, but none of them are
>>>>         currently in the main branch of PET.
>>>>
>>>>         What you propose with the filtering rule will, I think,
>>>>         force the grammar to use generic types everywhere, rather
>>>>         than use what's in the lexicon. I very much doubt that is
>>>>         what you want to do?
>>>>
>>>>         Rebecca
>>>>
>>>>
>>>>         On Wed, Sep 18, 2013 at 3:26 PM, Paul Haley
>>>>         <paul at haleyai.com <mailto:paul at haleyai.com>> wrote:
>>>>
>>>>             Hello,
>>>>
>>>>             I may be making some conceptual progress on this...
>>>>
>>>>             I went back to the chart mapping tutorial
>>>>             (http://moin.delph-in.net/Chart_Mapping) and found
>>>>             myself looking at the following lexical filtering rule
>>>>             from the ERG's lfr.tdl:
>>>>
>>>>                 ;; throw out generic whenever a native entry is
>>>>                 available, unless the token is
>>>>                 ;; a named entity (which now includes names
>>>>                 activated because of mixed case or
>>>>                 ;; non-sentence-initial capitalization).
>>>>                 ;;
>>>>                 generic_non_ne+native_lfr := lexical_filtering_rule &
>>>>                 [ +CONTEXT < [ SYNSEM.PHON.ONSET con_or_voc ] >,
>>>>                   +INPUT < [ SYNSEM.PHON.ONSET unk_onset,
>>>>                 ORTH.CLASS non_ne ] >,
>>>>                   +OUTPUT < >,
>>>>                   +POSITION "I1 at C1" ].
>>>>
>>>>             Is it the case that I want the +CONTEXT and +INPUT to
>>>>             be exactly reversed with NO_DEFAULT_LES or
>>>>             DEFAULT_LES_POSGAPS_LEXGAPS?
>>>>
>>>>             Thank you,
>>>>             Paul
>>>>
>>>>
>>>>             On 9/17/2013 4:54 PM, Paul Haley wrote:
>>>>>             Hi,
>>>>>
>>>>>             It seems that when I send FSC w/ TNT tags for some but
>>>>>             not all tokens I get ERG LEs that do not satisfy the
>>>>>             provided tags when using any of NO_DEFAULT_LES,
>>>>>             DEFAULT_LES_ALL, or DEFAULT_LES_POSGAPS_LEXGAPS. It
>>>>>             does respect these tags when there are no
>>>>>             corresponding ERG LEs, however, which is good.
>>>>>
>>>>>             Is there a way that I can get PET w/ the ERG to
>>>>>             respect the TNT tags when provided but otherwise use
>>>>>             the ERG LEs?
>>>>>
>>>>>             Thank you,
>>>>>             Paul
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/pet/attachments/20130918/93888a88/attachment-0001.html>


More information about the pet mailing list