[developers] pending/777: generation does not work (pending)

Mon Jan 30 20:55:34 CET 2006

hi again, sara,

> I'm attaching the grammar I had troubles with. It is for both Swedish
> and English, everything is included if loaded with "script". Only the 
> English part (more or less, at least no non-English characters) if
> loaded with "enscript". But the current problems are the same.

> Besides the generation there is also something wrong in the irules
> file, with the letterset !v that I cannot figure out. 

loading `enscript' actually fails (because `conjug' is used but defined
only in `sw.tdl', i believe), but `script' seems to work fine.  the !v
warning messages were initially cryptic, i must agree.  what appears to
be the problem are sub-rules like (!v er), where !v occurs only on the
left side of the sub-rule.  ann rewrote the morphology code last year,
and it looks like the new code would expect you to do the following

  %(wild-card (?v aouÃ¥eiyÃ¤Ã¶))

  pres-verb-er := 
  %suffix (* er) (?v er)
  pres-verb-lex-rule &
  [ DTR.SYNSEM.LOCAL.CAT.HEAD.CONJUG pres-er ]. 

the warnings that you see, e.g.

  Warning: legacy mode - \
    treating unmatched letterset !V in PRES-VERB-ER as wild-card

appear to just indicate that the LKB is treating your (!v er) as if it
were the above.  ann, can you comment on the intentions here?  without
thinking hard about it, it seems that having to duplicate a letter set
definition, so that it can be used for only one side of a sub-rule, is
hardly desirable?

but, sara, fortunately those warnings are unrelated to your problem in
generation, and in fact i think you can simplify all your usages of !v,
e.g. instead of the above just say

  pres-verb-er := 
  %suffix (* er)
  [...]

after all, the `*' sub-rule seems to properly subsume (?v er).

regarding your problems with generation, the LKB needs to distinguish
between rules that have orthographemic effects and those that do not.
`user-fns.lsp' provides a predicate spelling-change-rule-p(), which by
default (for silly historical reasons) looks for a feature NEEDS-AFFIX
on rules to identify the ones with orthographemic effects.

your lexical rules

  pres-verb-lex-rule := infl-ltow-rule & sw-verb-lex-rule &
  [ SYNSEM.LOCAL [ CAT.HEAD.VFORM finite,
		   CONT.HOOK.INDEX.E.TENSE pres ] ].

  [...]

  ;;dummy rules for irregular swedish verbs for tense
  pres-verb-irreg := pres-verb-lex-rule &
  [ DTR.SYNSEM.LOCAL.CAT.HEAD.CONJUG pres-irreg ].

  past-verb-irreg := past-verb-lex-rule &
  [ DTR.SYNSEM.LOCAL.CAT.HEAD.CONJUG past-irreg ].

inherit the property of being (seemingly) orthographemic, even though
they have no %suffix() or %prefix() annotation.  with such rules, the 
code in the generator applying morphological rules breaks.  one way of
working around that is to change in `user-fns.lsp':

  (defun spelling-change-rule-p (rule)
    (rule-orthographemicp rule))

which means that only rules with %suffix() or %prefix() are considered
to have orthographemic effects.  assuming that your `dummy rules' were
maybe intended to be triggered from an irregulars list (`irregs.tab'),
then you could put some dummy %suffix() annotation on, e.g.

  %suffix (_foo_ _bar_)

in which case you would not need to change spelling-change-rule-p().

ann, the function that breaks is

  LKB(10): (generate-from-mrs-internal *generator-input*)
  Error: Non-structure argument NIL passed to ref of structure slot 1
    [condition type: SIMPLE-ERROR]

  Restart actions (select using :continue):
   0: Return to Top Level (an "abort" restart).
   1: Abort entirely from this (lisp) process.
  [1] LKB(11): :bt
  Evaluation stack:

  MORPH-GENERATE <-
    FULL-MORPH-GENERATE <- CONSTRUCT-NEW-MORPH <-
    MRS::APPLY-INSTANTIATED-LEXICAL-RULES <-
    MRS::APPLY-INSTANTIATED-RULES-BASE <-
    MRS::INSTANTIATE-NULL-SEMANTIC-ITEMS <- 
    MRS::COLLECT-LEX-ENTRIES-FROM-MRS <-
    (:INTERNAL GENERATE-FROM-MRS-INTERNAL 0) <- TIME-A-FUNCALL <-
    GENERATE-FROM-MRS-INTERNAL <- [... EXCL::%EVAL ] <- EVAL <-
    TPL:TOP-LEVEL-READ-EVAL-PRINT-LOOP <- TPL:START-INTERACTIVE-TOP-LEVEL

  [1] LKB(12): :cur
  (MORPH-GENERATE "HIMSELF" PRES-VERB-IRREG)
  [1] LKB(13): 

i just checked in a patch to make it more robust (see attachment), but
i guess we should decide on our attitude regarding rules like these.  i
think either we should allow such rules (liberating people from having
to put dummy %suffix() lines on irregular-only rules), or there should
be a warning at grammar load time.  i sort of like the purity of using
the rule-orthographemicp() test instead of the clunky NEEDS-AFFIX, but
on the other hand i believe that some grammars use morphological rules
that are (practically) only triggered from `irregs.tab'.

berthold, i suspect you are in this latter class: what are you doing?

> All *.tdl-files with Swedish characters are now utf-8. Some of the
> other ones are "lisp/scheme code" or ascii, but I cannot figure out
> how to change them into utf-8 files. Don't know if that might have
> something to do with the problems...

it looked like all the files have the right encoding.  a thing you can
safely do in addition is adding emacs(1) headers to files, e.g.

  ;;; Hey, emacs(1), this is -*- Mode: TDL; Coding: utf-8; -*- got it?

i hope the above will get you going with generation.  it looks like you
have a fair number of semantically empty lexical entries, and creating
some trigger rules at some point might be beneficial.  we still have no
documentation on this bit of LKB magic, i am afraid, but someone should
really write it.  also, there are other ways of making generation more
efficient, and i would be happy to help tune your grammar a bit.  that
would benefit from a set of test sentences, though.  for example, then
one could construct a set of quick-check paths, maybe enable packing in
generation and such, see:

  http://www.informatics.susx.ac.uk/research/nlp/carroll/papers/ijcnlp05.pdf

my apologies, i often tend to underestimate how long it will take me to
complete a thread like this.  i am copying the `developers' list again,
so that you might also get feedback from others.

                                                      all best  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2285 7989
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at hf.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-------------- next part --------------
A non-text attachment was scrubbed...
Name: morph.patch
Type: application/octet-stream
Size: 1745 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20060130/a1b3ef93/attachment.obj>