[pet] passing in some but not all tags
Paul Haley
paul at haleyai.com
Wed Sep 18 17:41:18 CEST 2013
Good points.
In this application, as of now, we only send tags within a single
"basic" part of speech (e.g., NN.*, VB.*, JJ.*, RB.*, DT, IN). I'd like
not to be limited to choosing between NN and VBG, for example, though.
I guess we could add a feature, within or besides +TNT, for each of the
PTB tags, + or - (or perhaps, better, w/ a probability).... How does
that sound? (This will interact with the type hierarchy that the FSC
tokenizer uses is PET. Actually, maybe not: maybe a rule per in pos.tdl?)
Any further thoughts on multi-token lexemes would be most sincerely
appreciated. (I'm assuming that they would be in a different
cell/context of the chart.)
This is working satisfactorily (preliminarily).
Thanks MUCH!
Paul
On 9/18/2013 11:28 AM, Bec Dridan wrote:
> Hi Paul,
>
> People more expert in the chart mapping rules and the grammar might
> want to chime in, but broadly speaking, your rule looks like it will
> work. There's a couple of ways you may run into issues:
>
> * if you input multiple tags for the same token, rules get complicated
> * you may get unexpected results when the ERG native token is a
> multi-token entry (like "for example")
> * as I said before, sometimes the mapping between PTB and ERG types
> is not what you'd expect
>
> But if you are limiting the places and tags where you try and
> restrict, you should be able to come up with a workable solution this
> way, I think.
>
> Rebecca
>
>
>
> On Wed, Sep 18, 2013 at 5:04 PM, Paul Haley <paul at haleyai.com
> <mailto:paul at haleyai.com>> wrote:
>
> Your comments have been quite helpful in getting me headed in what
> appears to be the right direction...
>
> I now think default LEs (whether none or only for gaps) has little
> bearing provided there is a tag provided (at least that is what I
> am observing in the behavior.)
>
> I have modified lfr.tdl as below and confirmed that I no longer
> get the native verbal LE for "array" provided any of NN, NNS,
> NNPS, NNP (it looks like I need to send $ instead of S for two of
> those, though.)
>
> What do you think? Do I need a bunch of the latter?
>
> Thanks again!
> Paul
>
>
> #|
>
> generic_non_ne+native_lfr := lexical_filtering_rule &
> [ +CONTEXT < [ SYNSEM.PHON.ONSET con_or_voc ] >,
> +INPUT < [ SYNSEM.PHON.ONSET unk_onset, ORTH.CLASS non_ne ] >,
> +OUTPUT < >,
> +POSITION "I1 at C1" ].
> |#
>
> exclude_verbal_given_nominal_lfr := lexical_filtering_rule &
> [ +CONTEXT < [ +TNT.+TAGS < ^N.*$ > ]>,
> +INPUT < [ SYNSEM basic_verb_synsem ] >,
>
> +OUTPUT < >,
> +POSITION "I1 at C1" ].
>
>
> On 9/18/2013 10:50 AM, Bec Dridan wrote:
>> Hi Paul,
>>
>> DEFAULT_LES controls when we use the default generics rather
>> than, or possibly alongside the native entry.
>> The options mean, as far as I understand them:
>>
>> NO_DEFAULT_LES: if there is no native entry, do nothing, ignore
>> tags, parse will fail.
>> DEFAULT_LES_ALL: always create a generic entry from any input POS
>> tags (although these can be filtered out later)
>> DEFAULT_LES_POSGAPS_LEXGAPS: create a generic entry from any
>> input POS tags only where there was no native entry available
>>
>> None of them have anything to do with restricting native entries.
>>
>> Restricting lexical entries the way you want is generally called
>> supertagging, although the term "supertag" also refers to the
>> fact that the tags generally used in this manner are more
>> fine-grained than standard POS tags. Unfortunately, that's not in
>> the mainstream PET release so far, because it is not that
>> straightforward. There are several development implementations
>> around that might do what you want, but they would all need to be
>> configured to your particular set up. For one thing, the mapping
>> from PTB tags isn't always clear-cut - the ERG lexical entries
>> don't always align exactly with the PTB distinctions and so most
>> (all?) work has been based on restricting by tags related to the
>> lexical entries. As far as I know, there's no current
>> implementations that can restrict by PTB POS tags, although
>> others might know?
>>
>> Rebecca
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Sep 18, 2013 at 4:12 PM, Paul Haley <paul at haleyai.com
>> <mailto:paul at haleyai.com>> wrote:
>>
>> I should correct my prior...
>>
>> It is not that the native LEs are taking precedence, but that
>> native LEs that are not consistent with the input PoS are
>> still being added to the chart.
>>
>> For example, if I pass in "array" with "NN", I'm still
>> getting array_v1 in the chart. I want array_n1 in the
>> chart. So, what I'm after is pruning the native LEs to those
>> that are consistent with the input PoS (or living with the
>> generics in the case of no natives).
>>
>> Does that sound like what you called super-tagging?
>>
>> Paul
>>
>>
>> On 9/18/2013 10:04 AM, Paul Haley wrote:
>>> I had that fear, too! Which is why I asked.
>>>
>>> I gave it a try with no default LEs. To my surprise, the
>>> native lexical entries are still taking precedence! (So I
>>> must be missing something.)
>>>
>>> On 9/18/2013 9:42 AM, Bec Dridan wrote:
>>>> Hi Paul,
>>>>
>>>> The POS input to PET is only designed for unknown word
>>>> handling (ie when there are no corresponding ERG LEs, as
>>>> you noticed). It sounds like what you are after is more
>>>> like supertagging, restricting the lexical types used
>>>> according to some tags on the input? I've played around a
>>>> bit with different methods to do that, but none of them are
>>>> currently in the main branch of PET.
>>>>
>>>> What you propose with the filtering rule will, I think,
>>>> force the grammar to use generic types everywhere, rather
>>>> than use what's in the lexicon. I very much doubt that is
>>>> what you want to do?
>>>>
>>>> Rebecca
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 3:26 PM, Paul Haley
>>>> <paul at haleyai.com <mailto:paul at haleyai.com>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I may be making some conceptual progress on this...
>>>>
>>>> I went back to the chart mapping tutorial
>>>> (http://moin.delph-in.net/Chart_Mapping) and found
>>>> myself looking at the following lexical filtering rule
>>>> from the ERG's lfr.tdl:
>>>>
>>>> ;; throw out generic whenever a native entry is
>>>> available, unless the token is
>>>> ;; a named entity (which now includes names
>>>> activated because of mixed case or
>>>> ;; non-sentence-initial capitalization).
>>>> ;;
>>>> generic_non_ne+native_lfr := lexical_filtering_rule &
>>>> [ +CONTEXT < [ SYNSEM.PHON.ONSET con_or_voc ] >,
>>>> +INPUT < [ SYNSEM.PHON.ONSET unk_onset,
>>>> ORTH.CLASS non_ne ] >,
>>>> +OUTPUT < >,
>>>> +POSITION "I1 at C1" ].
>>>>
>>>> Is it the case that I want the +CONTEXT and +INPUT to
>>>> be exactly reversed with NO_DEFAULT_LES or
>>>> DEFAULT_LES_POSGAPS_LEXGAPS?
>>>>
>>>> Thank you,
>>>> Paul
>>>>
>>>>
>>>> On 9/17/2013 4:54 PM, Paul Haley wrote:
>>>>> Hi,
>>>>>
>>>>> It seems that when I send FSC w/ TNT tags for some but
>>>>> not all tokens I get ERG LEs that do not satisfy the
>>>>> provided tags when using any of NO_DEFAULT_LES,
>>>>> DEFAULT_LES_ALL, or DEFAULT_LES_POSGAPS_LEXGAPS. It
>>>>> does respect these tags when there are no
>>>>> corresponding ERG LEs, however, which is good.
>>>>>
>>>>> Is there a way that I can get PET w/ the ERG to
>>>>> respect the TNT tags when provided but otherwise use
>>>>> the ERG LEs?
>>>>>
>>>>> Thank you,
>>>>> Paul
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/pet/attachments/20130918/93888a88/attachment-0001.html>
More information about the pet
mailing list