[developers] developers Digest, Vol 66, Issue 3
Stephan Oepen
oe at ifi.uio.no
Thu Jul 15 13:48:49 CEST 2010
hi again, antonio,
> Most likely as much as a grammar can possibly be, since all that
> is "below/before" configurational syntax (and even bits of this in
> case one include NER in this realm) is obtained outside/before
> the grammar, and this is what goes in our input (pic) representation:
> - surface form
> - lemma
> - inflection features
> - ner
>
> Also a key reason to being using PIC is the ability provided
> to constrain POS tag of input words that are neverrtheless
> (ambigously) known to the grammar/lexicon.
i worry we are thinking of different things here, maybe. in the chart
mapping setup, the goal is to support external preprocessing (i.e. the
analysis steps you mention) better than before. thus the proposed PET
revisions would not entail you change the overall setup or composition
of pre-processing steps. the change is in how relevant information is
put to use inside the parser. all of the above can be input to PET in
FSC (as parts of the token FSs), i think. the main change probably is
in how the grammar is given access to information from preprocessing.
to put CARGs or PREDs on unknown words, for example, there used to be
procedural support (destructively stamping extra information handed to
the parser in PiC into designated paths in lexical entries). this has
the disadvantage that such mechanisms are not transparent and only work
when parsing with PET (and PiC). the new approach is to give a lexical
entry full access to the token FS(s) that activate this entry, simply
by unifying a list of token FSs into a TOKENS path provided on lexical
entries (optionally). this way, the lexical entry is free to pick up
information by coreference, i.e. raise the CARG or PRED from the token
into its KEYREL. derivations recorded in [incr tsdb()] now include the
token FSs as the leafs of derivations, hence when rebuilding these, all
information that contributed to the original AVM (and thus MRS) will be
available again, giving identical results (including characterization).
but of course i understand why you might not want to revise what works
well for you just now. the proposed `legacy' branch will allow users
of PiC (or SMAF or other functionality being refactored in this round)
to continue using PET the way they are used to. however, it would be
helpful for the refactoring to know whether there is functionality that
cannot be replicated in the new (purer) setup. i talked with francisco
a little this week, and we failed to come up with things you currently
do through PiC that would be impossible to do in the revised universe,
not even difficult, i believe. would you have the time to look at the
PetInput page on the DELPH-IN wiki (and maybe some of its descendants),
to see whether you can anticipate conceptual or technical problems?
several of us have thought intensely about these interface aspects for
the past two years; we believe we have arrived at a general, scalable,
and much improved solution. the proposed merging of branches and code
cleaning is a good opportunity for consolidating the PET code base and
making the existing (small) developer group more productive. we need
to take these steps carefully, of course (in consultation with users);
but backwards compatibility with imperfect, redundant solutions which
have chaotically grown through the past decade should not be our main
guiding principle, in my view.
best wishes - oe
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++ --- oe at ifi.uio.no; stephan at oepen.net; http://www.emmtee.net/oe/ ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the developers
mailing list