[developers] algebra code

Wed Sep 7 09:52:44 CEST 2005

I have set things up so the algebra code is now enabled by default if the
grammar has the variables for qeq paths and so on set.  Details are below.  Try
it out or not as you like - I would be interested to know how well the checking
works, especially on relatively clean Matrix grammars.  Let me know if anything
MRS related breaks - I have peripherally touched some other pieces of code and
may have contrived to cause failures as I did yesterday.

Ann

The algebra code has the following functionality:

1a) allows display of the semantics associated with a node in a parse tree,
using the same sort of notation as in the algebra paper:
`Sement' on the menu associated with nodes in the expanded parse tree

A sement is displayed in the format:

hook (ltop,index,xarg) - xarg may be omitted in simple grammars
slots - a list of hooks and slot names - 
        e.g., [h5,x6]COMPS1,[h7,x3]SPR1/COMPS2.SPR1
relations (currently one per line)
qeqs (currently one per line)

1b) allows display of the semantics associated with a rule: `Rule sement'

Thus, in order to see the semantic daughters of a node, one clicks on
the daughters in the parse tree and selects `sement', and clicks on
`rule sement' in the node itself.  The C-CONT of the rule is treated
as though it were an extra daughter.

2) checks a node to see whether its construction obeys the algebra
(`Check algebra' on the menu associated with nodes in the expanded parse tree)

3) allows a whole analysis to be checked to see if all nodes in it obey
the algebra (function 'mrs::check-algebra-on-chart - is suitable for being
made the value of lkb::*do-something-with-parse* and thus invoked by
batch checking).  Note that this does not just apply to edges that
make up a successful parse - all edges are checked.

Details:

The sements are extracted from the FSs associated with a node in much
the same way that an MRS is extracted from the FS for the parse for a
sentence.  Most of the extraction code is shared.  However, the
identification of slots is new.  Various parameters have to be set to
allow identification of slots in addition to the usual mrsglobals.

The code to identify slots walks over a feature structure, finding all
paths in the syntax part of the FS that lead to a hook type.  This
hook is taken to correspond to a slot and the path name is used to
generate the slot name.  The slot finding code ignores all the paths
in *algebra-ignore-paths* and also all paths containing a feature in
*algebra-ignore-feats*.

*algebra-ignore-feats* '(arg-s)

*algebra-ignore-paths*  '((c-cont) (synsem local cont)
                          (nh-dtr) (hd-dtr) (dtr))

A slot is only postulated if some element in its hook is coindexed
with an element in the rels or in the hook of the sement (this may be
too restrictive).
All paths that lead to a hook are stored but some names are filtered.
A slot gets named after the list of features on the path to the hook.
Slot naming is controlled by *non-slot-features* - these features
in the path to a slot are ignored when constructing the name.

*non-slot-features* '(cont hook synsem local cat val head))

List features are interpreted so that we get COMPS1 in a slot name.

In the case of multiple paths leading to the same hook, we distinguish
between uninteresting cases and interesting cases, such as control.

e.g. assume `try' shares its SPR hook with the COMP1 SPR1 hook (because
the whole SPR is shared).  So if the paths to the hook are 
((SPR FIRST SEM HOOK)(COMPS REST FIRST SPR FIRST SEM HOOK)), the slot is named
SPR1/COMPS2.SPR1

Slot naming is only for UI purposes and does not play a role in checking.

Algebra checking is done on a per node (per edge) basis on the edges
as they are seen by a bottom up parser (i.e., chart edge FSs rather than
the fully instantiated FSs).  The sement daughters are extracted and
the rule sement.  Semantically vacuous sements are ignored (i.e.,
things with no rels and no slots), except in the case where there are
no non-vacious daughters (e.g., expletive `it' by itself).  The
daughters are combined in all possible permutations and combinations
since the algebra assumes binary branching.  A daughter may be
considered as a head dtr if it has one or more slots and these are
instantiated by the hook of the postulated non-head.  This process
should lead to a new sement, but may lead to several.  The node is
considered to pass the check if one of the sements generated is
mrs-equalp to the actual sement for the node.  Note that the check
allows variables to be underspecified, but does not allow relations to
be underspecified.  This causes a problem with the current version of
the ERG (see below).

Interactive checking produces a window with indications of the results
- some attempt is made to show the mismatches in the case of problems.
This display should be tidied up.  Batch checking writes a list of
edge numbers with problems to the output file - the assumption is
these can be interactively checked if desired (note that a tree can be
displayed from a node in the chart display window)

mrscomp grammar:

the checked in version of the mrscomp grammar works fairly well with
the algebra.  I have not checked it exhaustively, but what I have
checked validates.  I would like to know of non-validating situations.

ERG (version of Sept 5):

In order for algebra to work, the following needs to be added to
synsem_min1

  LOCAL.CONT.HOOK.INDEX #index,
  --SIND #index 

Without this, hooks are not identified on lexical entries etc because
the type expansion has not been done.

With this addition and the default values for the global variables
mentioned above from the main mrsglobals.lisp, the display of the
algebra works for the examples I have tried.

Checking does not succeed in all cases.  A problem that occurs
throughout is that the message relations are often specialised in a
manner that cannot be determined from the sements extracted from the
grammar rules and the daughters.  Hence no S node validates if there
is a full stop and various other nodes fail to validate.  This is a
genuine violation of the algebra as it stands.  I have also seen a
problem that appears to be connected with XARG on extracted nodes.
Further checking awaited.

September 7 2005