[developers] developers Digest, Vol 66, Issue 3

Rebecca Dridan bec.dridan at gmail.com
Thu Jul 8 15:42:52 CEST 2010


Hi Antonio,

Just to chime in on this part:
>
> Also a key reason to being using PIC is the ability provided
> to constrain POS tag of input words that are neverrtheless
> (ambigously) known to the grammar/lexicon.
>

This sounds like supertagging (parser restriction)? If that's the case, 
it is quite easily done using FSC and chartmapping, and I could give you 
scripts that could make most of the related grammar changes 
automatically. It doesn't solve your other problems, but this point 
shouldn't be a deal-breaker.

Rebecca







>> and how much effort would be involved in making the transition? 
>
> Given our funding conditions: unaffordable.
>
> Given our medium to long term goal, i.e. to get as fast as possible
> to a set of materials for Portuguese with size and maturity of those
> existing for English:  redoing what took us several years of time
> is also unaffordable
>
>
>> i understand you are about to experiment with migrating to token 
>> mapping for independent reasons, 
>
> not necessarily: we are living (for almost one year now) with our
> patch for the HCONS problem in MRSs and waiting for a principled
> solution that permits just removing that patch
>
> best,
>
> --Ant.
>
>
>
> and i wholeheartedly believe you stand to gain
>> from these revisions.  in doing so, are there interface aspects that 
>> you believe cannot be accomodated within the assumptions of my ‘pure’ 
>> vision above?
>>
>> best wishes, oe
>>
>>
>>
>>
>>
>> On 8. juli 2010, at 12.17, Ant—onio Branco 
>> <Antonio.Branco at di.fc.ul.pt> wrote:
>>
>>>
>>>
>>>
>>> Dear Uli,
>>>
>>> Please note that in the case of the Portuguese grammar,
>>> interfacing the deep grammar with our pre-processing
>>> tools (POS tagger, lemmatizer, morphological analyzer,
>>> NER) are done exclusively via PIC, so having PET
>>> without PIC would knock out our grammar from running
>>> on it.
>>>
>>> All the best,
>>>
>>>
>>> --Ant.
>>>
>>>
>>> P.S.: Francisco has already sent to Bernd a detailed description
>>> of the problem with PET we reported in Paris, together with
>>> our grammar so that you will be able to reproduce it on your side.
>>> For the sake of the recording he'll be submiting a ticket
>>> in PET bug tracker as well.
>>>
>>>
>>>
>>>
>>> developers-request at emmtee.net wrote:
>>> ----------
>>>> Message: 1
>>>> Date: Wed, 07 Jul 2010 18:34:40 +0200
>>>> From: Ulrich Schaefer <ulrich.schaefer at dfki.de>
>>>> Subject: Re: [developers] [pet] [delph-in] Poll to identify actively
>>>>    used functionality in PET
>>>> To: pet at delph-in.net, developers at delph-in.net
>>>> Message-ID: <4C34ACA0.4050304 at dfki.de>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>> From my / Heart of Gold's point of view, it would be OK to have a 
>>>> PET without PIC and SMAF. REPP isn't used in any of the hybrid 
>>>> workflows AFAIR.
>>>> It implies that we will probably loose integrations for Italian and 
>>>> Norwegian (no harm as the grammars are no longer actively developed 
>>>> I guess) and maybe also Greek, but we would gain cleaner and 
>>>> uniform configuration settings, hopefully.
>>>> I hope we will also be able to integrate Spanish with FreeLing via 
>>>> FSC soon.
>>>> I would also like to replace the current stdin/stderr communication 
>>>> in PetModule by XML-RPC as soon as possible.
>>>> -Uli
>>>> Am 07.07.2010 12:56, schrieb Stephan Oepen:
>>>>> i guess you're asking about FSR (aka FSPP) and REPP?  the latter 
>>>>> now supersedes FSPP in the LKB, and for all i know the existing 
>>>>> FSPP support in PET (based on ECL) is not UniCode-enabled and 
>>>>> builds on the deprecated SMAF.  hence, no practical loss purging 
>>>>> that from PET now, i'd think?  REPP, on the other hand, should be 
>>>>> natively supported in PET, in my view.  i seem to recall that you 
>>>>> had a C++ implementation of REPP?  woodley has a C 
>>>>> implementation.  maybe sometime this fall we could jointly look at 
>>>>> the choices (and remaining limitations: i believe none of the 
>>>>> existing implementations is perfect in terms of characterization 
>>>>> corner cases), and then add native REPP support to PET?
>>>>>
>>>>> as for FSC, there is pretty good documentation on the wiki now, 
>>>>> and it seems the format is reasonably stable.  i am inclined to 
>>>>> preserve YY format, as the non-XML alternative to inputting a 
>>>>> PoS-annotated token lattice.
>>>>>
>>>>> finally, i see your point about efficiency losses in 
>>>>> -default-les=all mode when combined with a very large number of 
>>>>> generics (i.e. one per LE type); personally, i'd think lexical 
>>>>> instantiation can be optimized to alleviate these concerns.  i 
>>>>> personally find the limitations in the old generics mode so severe 
>>>>> that i can't imagine going back to that mode.  but if there were 
>>>>> active users who'd be badly affected by its removal prior to 
>>>>> optimizing -default-les=all further, i have no opinion on when 
>>>>> best to ditch the old mode.
>>>>>
>>>>> best, oe
>>>>>
>>>>>
>>>>>
>>>>> On 7. juli 2010, at 02.03, Rebecca Dridan <bec.dridan at gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> I couldn't attend the PetRoadMap discussion - is there any 
>>>>>> summary of the discussion, or at least what decisions were made 
>>>>>> on the wiki?
>>>>>>
>>>>>>> Input formats we'd like to discard:
>>>>>>>
>>>>>>> - pic / pic_counts
>>>>>>> - yy_counts
>>>>>>> - smaf
>>>>>>> - fsr
>>>>>>>
>>>>>> Particularly, what is the plan for inputs? FSC seemed to do 
>>>>>> everything I had needed from PIC, but at the time it was 
>>>>>> undocumented, experimental code. Will FSC be the default input 
>>>>>> format when annotation beyond POS tags is needed?
>>>>>>
>>>>>>> -default-les=traditional  determine default les by posmapping 
>>>>>>> for all
>>>>>>>                         lexical gaps
>>>>>> Does this mean that we can either hypothesise every generic entry 
>>>>>> for every token (and then filter them), or not use generic 
>>>>>> entries at all? I found this to be a major efficiency issue when 
>>>>>> large numbers of generic entries were available. I don't have a 
>>>>>> problem with defaulting to the current "all" setting, but I think 
>>>>>> there are still possible configurations where one would like to 
>>>>>> react only when lexical gaps were found.
>>>>>>
>>>>>>> Because these are the only modules that require the inclusion of 
>>>>>>> ECL,
>>>>>>> support for ECL in PET will also be removed.
>>>>>> I celebrate the removal of ECL, but will there be any way of 
>>>>>> doing more than white space tokenisation natively in PET, or was 
>>>>>> the decision made that PET will always be run in conjunction with 
>>>>>> an LKB pre-processing step?
>>>>>>
>>>>>> Rebecca
>>>>>>
>
>
>




More information about the developers mailing list