[developers] Re: unknown words in pet again

Emily M. Bender ebender at u.washington.edu
Fri Feb 18 20:59:19 CET 2005

Hi Patricia, Stephan, et al,

I do not know anything about the internals of PET, but did
notice something on reading this exchange which might just
be relevant.  In the Matrix, WLINK is associated with particular
relations, as we are working with Minimal Recursion Semantics.
The fs copied below appears to be using P&S-94-style recursive
semantic representations (which is why, I guess, you have
WLINK showing up as a feature of cont).  I wonder if the relevant
code within PET is somehow assuming MRS?


On Thu, Feb 17, 2005 at 09:52:25PM -0800, Stephan Oepen wrote:
> hi patricia,
> i thought i would surprise you and actually reply to your message :-).
> i am really sorry i have been less than helpful lately, but i now copy
> your query to a new group of resourceful people: we are in the process
> of consolidating DELPH-IN resources (see `http://www.delph-in.net'), so
> that there now is a mailing list for participating developers.
> > sorry about bothering you again with this, but the issue is still
> > actual for us and in spite of repeated efforts we don't seem able to
> > find a solution.  I hope you'll be able to open our eyes or get us in
> > touch with somebody who can do that without waiting for a pet web
> > site to be set up.
> as you may or may not know, ulrich callmeier has more or less retired
> from active PET development, and bernd kiefer (DFKI) has been the main
> developer since late in 2003.  others (including myself and frederik
> fouvry continue to make occasional contributions, and ulrich is still
> kibitzing, i think).  bernd is on the brink of releasing a new version
> that has quite a lot of new functionality, including some that i expect
> will solve your problem in the long run.  i expect bernd will follow up
> to this message and point you to a new download site soon.
> > As you may remember, here at CST we have been working with pet using 
> > POS-tags in the input (yy-format) and have some generic rules that are 
> > activated and used in the analysis when there are unknown words in the 
> > input.
> > 
> > The structure of the parse-output is fine, but ... unfortunately we are 
> > cannot get the orth-string represented in the parse-result. More 
> > specifically, the string should appear in the semantic part of the 
> > parse-result which in our grammar is given by "CONT".
> > 
> > We have tried to follow a hint we got from Ulrich Callmeier 17. Oct 
> > 2003, which suggested to define label-path and label-path-tail in 
> > danish.set. He suggested that we used mrs_stamp_fs to "have an 
> > identifier "stamped" into its WLINK attribute which we could use to 
> > identify the relevant token from the original input string".
> > 
> > Now we have tried to do this but we just get an glbtypexx in the 
> > parse-result and not and integer(-list) as we expected. In other words,
> > we still do not get a trace of the input-word in the output parse.
> > 
> > Perhaps we have defined the WLINK incorrectly:
> > 
> > WLINK is defined in types.tdl:
> >
> > cont := *top* &
> > [ WLINK *cons* ].
> >
> > as we can see in the mrs.cpp file WLINK should be of the type cons to be 
> > able to use mrs_stamp_fs. But we do not know what the right definition 
> > would be. We have tried to find an explanation of the *cons* type in the
> > documentation, or meaningful examples of it in matrix and other grammars,
> > without success. Below, you can see that the value of WLINK is set to 
> > some glbtype, which seems to show that it has unified with the integer 
> > that points at the input word, but the result is not very meaningful let 
> > alone usable.
> > 
> > Parse-output (CONT feature):
> > CONT #8:
> > [FOCUS-CONST #2:kursus
> >  COUNT       #4:all
> >  LOA         ne-list-of-assoc
> >              [FIRST kursusudbud
> >                     [COURSE   #1:wh-nom-obj
> >                               [RESTR    #2:kursus
> >                                WLINK    #3:*cons*
> >                                         [FIRST glbtype159
> >                                          REST  *null*]
> > 
> >                                WH-COUNT #4:all]
> >                      TEACHER  nom-obj
> >                               [RESTR lærerstab]
> >                      PROVIDER #5:nom-obj
> >                               [RESTR universitetsorgan
> >                                      [NAME #6:string]
> >                                WLINK #7:*cons*
> >                                      [FIRST glbtype154
> >                                       REST  *null*]]]
> >              REST  list-of-assoc]]
> as said before, i expect there will be a straightforward way of getting
> the desired effect using the XML facilities of the new PET.  but i will
> at least try to give those good old YY facilities some thought:
>   - i presume you have made sure stamp_fs() gets called (in `item.cpp)? 
>   - what input (in YY format) did you give to PET?  thus what range of
>     integers did you expect in WLINK?
>   - the above output may actually indicate that things are working for
>     you: lacking the ability to create new types at run-time (in older
>     versions of PET), the stamping actually just puts the integer for
>     the input token identifiers into the type slot of those nodes; the
>     (old) MRS extraction code would then treat these as token ids, but
>     the plain fs printing routine will think those integers are types.
>   - assuming the above was true, it might be sufficient to modify the 
>     fs printer to give special treatment to elements inside of WLINK,
>     i.e. look up the feature code for WLINK and special case it during
>     printing, i.e. print, say, WLINK <47, 11> instead, assuming there
>     were two elements (corresponding to a multi-word, which was why we
>     made it list-valued in the first place).
> well, i hope the above makes sense and might get you going immediately
> --- if not either wait for the improved version of PET (where input can
> be specified in XML, and arbitrary values can thus be inserted into the
> feature structures retrieved from the lexicon) or send me a copy of the
> grammar, and i will see whether i can make the above work out.
>                                                     ha det bra  -  oe
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++ Universitetet i Oslo (ILF); Boks 1102 Blindern; 0317 Oslo; (+47) 2285 7989
> +++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
> +++       --- oe at csli.stanford.edu; oe at hf.uio.no; stephan at oepen.net ---
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

More information about the developers mailing list