[pet] [delph-in] Poll to identify actively used functionality in PET
Stephan Oepen
oe at ifi.uio.no
Wed Jul 7 12:56:00 CEST 2010
i guess you're asking about FSR (aka FSPP) and REPP? the latter now
supersedes FSPP in the LKB, and for all i know the existing FSPP
support in PET (based on ECL) is not UniCode-enabled and builds on the
deprecated SMAF. hence, no practical loss purging that from PET now,
i'd think? REPP, on the other hand, should be natively supported in
PET, in my view. i seem to recall that you had a C++ implementation
of REPP? woodley has a C implementation. maybe sometime this fall we
could jointly look at the choices (and remaining limitations: i
believe none of the existing implementations is perfect in terms of
characterization corner cases), and then add native REPP support to PET?
as for FSC, there is pretty good documentation on the wiki now, and it
seems the format is reasonably stable. i am inclined to preserve YY
format, as the non-XML alternative to inputting a PoS-annotated token
lattice.
finally, i see your point about efficiency losses in -default-les=all
mode when combined with a very large number of generics (i.e. one per
LE type); personally, i'd think lexical instantiation can be optimized
to alleviate these concerns. i personally find the limitations in the
old generics mode so severe that i can't imagine going back to that
mode. but if there were active users who'd be badly affected by its
removal prior to optimizing -default-les=all further, i have no
opinion on when best to ditch the old mode.
best, oe
On 7. juli 2010, at 02.03, Rebecca Dridan <bec.dridan at gmail.com> wrote:
> I couldn't attend the PetRoadMap discussion - is there any summary
> of the discussion, or at least what decisions were made on the wiki?
>
>> Input formats we'd like to discard:
>>
>> - pic / pic_counts
>> - yy_counts
>> - smaf
>> - fsr
>>
>
> Particularly, what is the plan for inputs? FSC seemed to do
> everything I had needed from PIC, but at the time it was
> undocumented, experimental code. Will FSC be the default input
> format when annotation beyond POS tags is needed?
>
>>
>> -default-les=traditional determine default les by posmapping for all
>> lexical gaps
>
> Does this mean that we can either hypothesise every generic entry
> for every token (and then filter them), or not use generic entries
> at all? I found this to be a major efficiency issue when large
> numbers of generic entries were available. I don't have a problem
> with defaulting to the current "all" setting, but I think there are
> still possible configurations where one would like to react only
> when lexical gaps were found.
>
>>
>> Because these are the only modules that require the inclusion of ECL,
>> support for ECL in PET will also be removed.
> I celebrate the removal of ECL, but will there be any way of doing
> more than white space tokenisation natively in PET, or was the
> decision made that PET will always be run in conjunction with an LKB
> pre-processing step?
>
> Rebecca
>
More information about the pet
mailing list