[developers] developers Digest, Vol 66, Issue 3
Rebecca Dridan
bec.dridan at gmail.com
Thu Jul 8 15:42:52 CEST 2010
Hi Antonio,
Just to chime in on this part:
>
> Also a key reason to being using PIC is the ability provided
> to constrain POS tag of input words that are neverrtheless
> (ambigously) known to the grammar/lexicon.
>
This sounds like supertagging (parser restriction)? If that's the case,
it is quite easily done using FSC and chartmapping, and I could give you
scripts that could make most of the related grammar changes
automatically. It doesn't solve your other problems, but this point
shouldn't be a deal-breaker.
Rebecca
>> and how much effort would be involved in making the transition?
>
> Given our funding conditions: unaffordable.
>
> Given our medium to long term goal, i.e. to get as fast as possible
> to a set of materials for Portuguese with size and maturity of those
> existing for English: redoing what took us several years of time
> is also unaffordable
>
>
>> i understand you are about to experiment with migrating to token
>> mapping for independent reasons,
>
> not necessarily: we are living (for almost one year now) with our
> patch for the HCONS problem in MRSs and waiting for a principled
> solution that permits just removing that patch
>
> best,
>
> --Ant.
>
>
>
> and i wholeheartedly believe you stand to gain
>> from these revisions. in doing so, are there interface aspects that
>> you believe cannot be accomodated within the assumptions of my ‘pure’
>> vision above?
>>
>> best wishes, oe
>>
>>
>>
>>
>>
>> On 8. juli 2010, at 12.17, Antonio Branco
>> <Antonio.Branco at di.fc.ul.pt> wrote:
>>
>>>
>>>
>>>
>>> Dear Uli,
>>>
>>> Please note that in the case of the Portuguese grammar,
>>> interfacing the deep grammar with our pre-processing
>>> tools (POS tagger, lemmatizer, morphological analyzer,
>>> NER) are done exclusively via PIC, so having PET
>>> without PIC would knock out our grammar from running
>>> on it.
>>>
>>> All the best,
>>>
>>>
>>> --Ant.
>>>
>>>
>>> P.S.: Francisco has already sent to Bernd a detailed description
>>> of the problem with PET we reported in Paris, together with
>>> our grammar so that you will be able to reproduce it on your side.
>>> For the sake of the recording he'll be submiting a ticket
>>> in PET bug tracker as well.
>>>
>>>
>>>
>>>
>>> developers-request at emmtee.net wrote:
>>> ----------
>>>> Message: 1
>>>> Date: Wed, 07 Jul 2010 18:34:40 +0200
>>>> From: Ulrich Schaefer <ulrich.schaefer at dfki.de>
>>>> Subject: Re: [developers] [pet] [delph-in] Poll to identify actively
>>>> used functionality in PET
>>>> To: pet at delph-in.net, developers at delph-in.net
>>>> Message-ID: <4C34ACA0.4050304 at dfki.de>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>> From my / Heart of Gold's point of view, it would be OK to have a
>>>> PET without PIC and SMAF. REPP isn't used in any of the hybrid
>>>> workflows AFAIR.
>>>> It implies that we will probably loose integrations for Italian and
>>>> Norwegian (no harm as the grammars are no longer actively developed
>>>> I guess) and maybe also Greek, but we would gain cleaner and
>>>> uniform configuration settings, hopefully.
>>>> I hope we will also be able to integrate Spanish with FreeLing via
>>>> FSC soon.
>>>> I would also like to replace the current stdin/stderr communication
>>>> in PetModule by XML-RPC as soon as possible.
>>>> -Uli
>>>> Am 07.07.2010 12:56, schrieb Stephan Oepen:
>>>>> i guess you're asking about FSR (aka FSPP) and REPP? the latter
>>>>> now supersedes FSPP in the LKB, and for all i know the existing
>>>>> FSPP support in PET (based on ECL) is not UniCode-enabled and
>>>>> builds on the deprecated SMAF. hence, no practical loss purging
>>>>> that from PET now, i'd think? REPP, on the other hand, should be
>>>>> natively supported in PET, in my view. i seem to recall that you
>>>>> had a C++ implementation of REPP? woodley has a C
>>>>> implementation. maybe sometime this fall we could jointly look at
>>>>> the choices (and remaining limitations: i believe none of the
>>>>> existing implementations is perfect in terms of characterization
>>>>> corner cases), and then add native REPP support to PET?
>>>>>
>>>>> as for FSC, there is pretty good documentation on the wiki now,
>>>>> and it seems the format is reasonably stable. i am inclined to
>>>>> preserve YY format, as the non-XML alternative to inputting a
>>>>> PoS-annotated token lattice.
>>>>>
>>>>> finally, i see your point about efficiency losses in
>>>>> -default-les=all mode when combined with a very large number of
>>>>> generics (i.e. one per LE type); personally, i'd think lexical
>>>>> instantiation can be optimized to alleviate these concerns. i
>>>>> personally find the limitations in the old generics mode so severe
>>>>> that i can't imagine going back to that mode. but if there were
>>>>> active users who'd be badly affected by its removal prior to
>>>>> optimizing -default-les=all further, i have no opinion on when
>>>>> best to ditch the old mode.
>>>>>
>>>>> best, oe
>>>>>
>>>>>
>>>>>
>>>>> On 7. juli 2010, at 02.03, Rebecca Dridan <bec.dridan at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I couldn't attend the PetRoadMap discussion - is there any
>>>>>> summary of the discussion, or at least what decisions were made
>>>>>> on the wiki?
>>>>>>
>>>>>>> Input formats we'd like to discard:
>>>>>>>
>>>>>>> - pic / pic_counts
>>>>>>> - yy_counts
>>>>>>> - smaf
>>>>>>> - fsr
>>>>>>>
>>>>>> Particularly, what is the plan for inputs? FSC seemed to do
>>>>>> everything I had needed from PIC, but at the time it was
>>>>>> undocumented, experimental code. Will FSC be the default input
>>>>>> format when annotation beyond POS tags is needed?
>>>>>>
>>>>>>> -default-les=traditional determine default les by posmapping
>>>>>>> for all
>>>>>>> lexical gaps
>>>>>> Does this mean that we can either hypothesise every generic entry
>>>>>> for every token (and then filter them), or not use generic
>>>>>> entries at all? I found this to be a major efficiency issue when
>>>>>> large numbers of generic entries were available. I don't have a
>>>>>> problem with defaulting to the current "all" setting, but I think
>>>>>> there are still possible configurations where one would like to
>>>>>> react only when lexical gaps were found.
>>>>>>
>>>>>>> Because these are the only modules that require the inclusion of
>>>>>>> ECL,
>>>>>>> support for ECL in PET will also be removed.
>>>>>> I celebrate the removal of ECL, but will there be any way of
>>>>>> doing more than white space tokenisation natively in PET, or was
>>>>>> the decision made that PET will always be run in conjunction with
>>>>>> an LKB pre-processing step?
>>>>>>
>>>>>> Rebecca
>>>>>>
>
>
>
More information about the developers
mailing list