[pet] [delph-in] Poll to identify actively used functionality in PET

Wed Jul 7 12:56:00 CEST 2010

i guess you're asking about FSR (aka FSPP) and REPP?  the latter now  
supersedes FSPP in the LKB, and for all i know the existing FSPP  
support in PET (based on ECL) is not UniCode-enabled and builds on the  
deprecated SMAF.  hence, no practical loss purging that from PET now,  
i'd think?  REPP, on the other hand, should be natively supported in  
PET, in my view.  i seem to recall that you had a C++ implementation  
of REPP?  woodley has a C implementation.  maybe sometime this fall we  
could jointly look at the choices (and remaining limitations: i  
believe none of the existing implementations is perfect in terms of  
characterization corner cases), and then add native REPP support to PET?

as for FSC, there is pretty good documentation on the wiki now, and it  
seems the format is reasonably stable.  i am inclined to preserve YY  
format, as the non-XML alternative to inputting a PoS-annotated token  
lattice.

finally, i see your point about efficiency losses in -default-les=all  
mode when combined with a very large number of generics (i.e. one per  
LE type); personally, i'd think lexical instantiation can be optimized  
to alleviate these concerns.  i personally find the limitations in the  
old generics mode so severe that i can't imagine going back to that  
mode.  but if there were active users who'd be badly affected by its  
removal prior to optimizing -default-les=all further, i have no  
opinion on when best to ditch the old mode.

best, oe

On 7. juli 2010, at 02.03, Rebecca Dridan <bec.dridan at gmail.com> wrote:

> I couldn't attend the PetRoadMap discussion - is there any summary  
> of the discussion, or at least what decisions were made on the wiki?
>
>> Input formats we'd like to discard:
>>
>> - pic / pic_counts
>> - yy_counts
>> - smaf
>> - fsr
>>
>
> Particularly, what is the plan for inputs? FSC seemed to do  
> everything I had needed from PIC, but at the time it was  
> undocumented, experimental code. Will FSC be the default input  
> format when annotation beyond POS tags is needed?
>
>>
>> -default-les=traditional  determine default les by posmapping for all
>>                          lexical gaps
>
> Does this mean that we can either hypothesise every generic entry  
> for every token (and then filter them), or not use generic entries  
> at all? I found this to be a major efficiency issue when large  
> numbers of generic entries were available. I don't have a problem  
> with defaulting to the current "all" setting, but I think there are  
> still possible configurations where one would like to react only  
> when lexical gaps were found.
>
>>
>> Because these are the only modules that require the inclusion of ECL,
>> support for ECL in PET will also be removed.
> I celebrate the removal of ECL, but will there be any way of doing  
> more than white space tokenisation natively in PET, or was the  
> decision made that PET will always be run in conjunction with an LKB  
> pre-processing step?
>
> Rebecca
>