[developers] developers Digest, Vol 66, Issue 3

Thu Jul 8 12:17:59 CEST 2010

Dear Uli,

Please note that in the case of the Portuguese grammar,
interfacing the deep grammar with our pre-processing
tools (POS tagger, lemmatizer, morphological analyzer,
NER) are done exclusively via PIC, so having PET
without PIC would knock out our grammar from running
on it.

All the best,

--Ant.

P.S.: Francisco has already sent to Bernd a detailed description
of the problem with PET we reported in Paris, together with
our grammar so that you will be able to reproduce it on your side.
For the sake of the recording he'll be submiting a ticket
in PET bug tracker as well.

developers-request at emmtee.net wrote:
----------
> 
> Message: 1
> Date: Wed, 07 Jul 2010 18:34:40 +0200
> From: Ulrich Schaefer <ulrich.schaefer at dfki.de>
> Subject: Re: [developers] [pet] [delph-in] Poll to identify actively
> 	used functionality in PET
> To: pet at delph-in.net, developers at delph-in.net
> Message-ID: <4C34ACA0.4050304 at dfki.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
>  From my / Heart of Gold's point of view, it would be OK to have a PET 
> without PIC and SMAF. REPP isn't used in any of the hybrid workflows AFAIR.
> It implies that we will probably loose integrations for Italian and 
> Norwegian (no harm as the grammars are no longer actively developed I 
> guess) and maybe also Greek, but we would gain cleaner and uniform 
> configuration settings, hopefully.
> I hope we will also be able to integrate Spanish with FreeLing via FSC soon.
> I would also like to replace the current stdin/stderr communication in 
> PetModule by XML-RPC as soon as possible.
> 
> -Uli
> 
> 
> 
> Am 07.07.2010 12:56, schrieb Stephan Oepen:
>> i guess you're asking about FSR (aka FSPP) and REPP?  the latter now 
>> supersedes FSPP in the LKB, and for all i know the existing FSPP 
>> support in PET (based on ECL) is not UniCode-enabled and builds on the 
>> deprecated SMAF.  hence, no practical loss purging that from PET now, 
>> i'd think?  REPP, on the other hand, should be natively supported in 
>> PET, in my view.  i seem to recall that you had a C++ implementation 
>> of REPP?  woodley has a C implementation.  maybe sometime this fall we 
>> could jointly look at the choices (and remaining limitations: i 
>> believe none of the existing implementations is perfect in terms of 
>> characterization corner cases), and then add native REPP support to PET?
>>
>> as for FSC, there is pretty good documentation on the wiki now, and it 
>> seems the format is reasonably stable.  i am inclined to preserve YY 
>> format, as the non-XML alternative to inputting a PoS-annotated token 
>> lattice.
>>
>> finally, i see your point about efficiency losses in -default-les=all 
>> mode when combined with a very large number of generics (i.e. one per 
>> LE type); personally, i'd think lexical instantiation can be optimized 
>> to alleviate these concerns.  i personally find the limitations in the 
>> old generics mode so severe that i can't imagine going back to that 
>> mode.  but if there were active users who'd be badly affected by its 
>> removal prior to optimizing -default-les=all further, i have no 
>> opinion on when best to ditch the old mode.
>>
>> best, oe
>>
>>
>>
>> On 7. juli 2010, at 02.03, Rebecca Dridan <bec.dridan at gmail.com> wrote:
>>
>>> I couldn't attend the PetRoadMap discussion - is there any summary of 
>>> the discussion, or at least what decisions were made on the wiki?
>>>
>>>> Input formats we'd like to discard:
>>>>
>>>> - pic / pic_counts
>>>> - yy_counts
>>>> - smaf
>>>> - fsr
>>>>
>>> Particularly, what is the plan for inputs? FSC seemed to do 
>>> everything I had needed from PIC, but at the time it was 
>>> undocumented, experimental code. Will FSC be the default input format 
>>> when annotation beyond POS tags is needed?
>>>
>>>> -default-les=traditional  determine default les by posmapping for all
>>>>                          lexical gaps
>>> Does this mean that we can either hypothesise every generic entry for 
>>> every token (and then filter them), or not use generic entries at 
>>> all? I found this to be a major efficiency issue when large numbers 
>>> of generic entries were available. I don't have a problem with 
>>> defaulting to the current "all" setting, but I think there are still 
>>> possible configurations where one would like to react only when 
>>> lexical gaps were found.
>>>
>>>> Because these are the only modules that require the inclusion of ECL,
>>>> support for ECL in PET will also be removed.
>>> I celebrate the removal of ECL, but will there be any way of doing 
>>> more than white space tokenisation natively in PET, or was the 
>>> decision made that PET will always be run in conjunction with an LKB 
>>> pre-processing step?
>>>
>>> Rebecca
>>>
>