[developers] Re: unknown words in pet again
oe at csli.Stanford.EDU
Fri Feb 18 06:52:25 CET 2005
i thought i would surprise you and actually reply to your message :-).
i am really sorry i have been less than helpful lately, but i now copy
your query to a new group of resourceful people: we are in the process
of consolidating DELPH-IN resources (see `http://www.delph-in.net'), so
that there now is a mailing list for participating developers.
> sorry about bothering you again with this, but the issue is still
> actual for us and in spite of repeated efforts we don't seem able to
> find a solution. I hope you'll be able to open our eyes or get us in
> touch with somebody who can do that without waiting for a pet web
> site to be set up.
as you may or may not know, ulrich callmeier has more or less retired
from active PET development, and bernd kiefer (DFKI) has been the main
developer since late in 2003. others (including myself and frederik
fouvry continue to make occasional contributions, and ulrich is still
kibitzing, i think). bernd is on the brink of releasing a new version
that has quite a lot of new functionality, including some that i expect
will solve your problem in the long run. i expect bernd will follow up
to this message and point you to a new download site soon.
> As you may remember, here at CST we have been working with pet using
> POS-tags in the input (yy-format) and have some generic rules that are
> activated and used in the analysis when there are unknown words in the
> The structure of the parse-output is fine, but ... unfortunately we are
> cannot get the orth-string represented in the parse-result. More
> specifically, the string should appear in the semantic part of the
> parse-result which in our grammar is given by "CONT".
> We have tried to follow a hint we got from Ulrich Callmeier 17. Oct
> 2003, which suggested to define label-path and label-path-tail in
> danish.set. He suggested that we used mrs_stamp_fs to "have an
> identifier "stamped" into its WLINK attribute which we could use to
> identify the relevant token from the original input string".
> Now we have tried to do this but we just get an glbtypexx in the
> parse-result and not and integer(-list) as we expected. In other words,
> we still do not get a trace of the input-word in the output parse.
> Perhaps we have defined the WLINK incorrectly:
> WLINK is defined in types.tdl:
> cont := *top* &
> [ WLINK *cons* ].
> as we can see in the mrs.cpp file WLINK should be of the type cons to be
> able to use mrs_stamp_fs. But we do not know what the right definition
> would be. We have tried to find an explanation of the *cons* type in the
> documentation, or meaningful examples of it in matrix and other grammars,
> without success. Below, you can see that the value of WLINK is set to
> some glbtype, which seems to show that it has unified with the integer
> that points at the input word, but the result is not very meaningful let
> alone usable.
> Parse-output (CONT feature):
> CONT #8:
> [FOCUS-CONST #2:kursus
> COUNT #4:all
> LOA ne-list-of-assoc
> [FIRST kursusudbud
> [COURSE #1:wh-nom-obj
> [RESTR #2:kursus
> WLINK #3:*cons*
> [FIRST glbtype159
> REST *null*]
> WH-COUNT #4:all]
> TEACHER nom-obj
> [RESTR lærerstab]
> PROVIDER #5:nom-obj
> [RESTR universitetsorgan
> [NAME #6:string]
> WLINK #7:*cons*
> [FIRST glbtype154
> REST *null*]]]
> REST list-of-assoc]]
as said before, i expect there will be a straightforward way of getting
the desired effect using the XML facilities of the new PET. but i will
at least try to give those good old YY facilities some thought:
- i presume you have made sure stamp_fs() gets called (in `item.cpp)?
- what input (in YY format) did you give to PET? thus what range of
integers did you expect in WLINK?
- the above output may actually indicate that things are working for
you: lacking the ability to create new types at run-time (in older
versions of PET), the stamping actually just puts the integer for
the input token identifiers into the type slot of those nodes; the
(old) MRS extraction code would then treat these as token ids, but
the plain fs printing routine will think those integers are types.
- assuming the above was true, it might be sufficient to modify the
fs printer to give special treatment to elements inside of WLINK,
i.e. look up the feature code for WLINK and special case it during
printing, i.e. print, say, WLINK <47, 11> instead, assuming there
were two elements (corresponding to a multi-word, which was why we
made it list-valued in the first place).
well, i hope the above makes sense and might get you going immediately
--- if not either wait for the improved version of PET (where input can
be specified in XML, and arbitrary values can thus be inserted into the
feature structures retrieved from the lexicon) or send me a copy of the
grammar, and i will see whether i can make the above work out.
ha det bra - oe
+++ Universitetet i Oslo (ILF); Boks 1102 Blindern; 0317 Oslo; (+47) 2285 7989
+++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++ --- oe at csli.stanford.edu; oe at hf.uio.no; stephan at oepen.net ---
More information about the developers