[developers] processing of lexical rules

Stephan Oepen oe at csli.Stanford.EDU
Tue May 3 00:40:51 CEST 2005


howdy,

> [...] oe needs to look at the active parser though - it's only
> running in passive mode now.

while i was looking (as i had promised ann i would today), i happened
to check in some changes to make the active parser work again.  i have
only tested this on the `toy' grammar that dan and i recently added in
`.../src/data/toy/', however.

i added a field `orthographemicp' to the .rule. structure, recording as
first class data whether or not a rule has an orthographemic effect.  i
set that flag from the presence of a %suffix() or %prefix() annotation
on the rule definition, and would suggest moving away from the clunky,
per-grammar spelling-change-rule-p() predicate.  when teaching, one of
the more frequent errors is students failing to make ORTH re-entrant on
non-orthographemic rules.

> I want to remerge with the main branch asap [...]

it seems may is almost over, and june looks pretty short too.  i fear i
will not have a lot of time for actively debugging the parser, but from
what i saw tonight, it looks like you have most of it converted.  there
are a few (new) remaining compiler warnings in `lui.lsp' that should be
resolved, and many of the [incr tsdb()] binary files seem to be missing
in the `tok-and-morph' branch, but generally things look encouraging.

dan is busy moving to an analysis of punctuation by affixation, so that
there may be an(other) eager consumer of the refined set-up.  two known
issues in the old set-up (and in both current PETs) are the following:

  (1) recursive application of orthographemic rules on irregular forms.
  (2) a need for prefixation on multi-word tokens; possibly a separate
      `infl-pos' for prefixation?

i suspect (1) may be solved in the new code already, while (2) seems to
require a slightly augmented specification.  do we ever foresee a need
to prefix at non-initial positions in multi-words?

                                                   good night  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (ILN); Boks 1102 Blindern; 0317 Oslo; (+47) 2285 7989
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at hf.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the developers mailing list