[matrix] Message-free Matrix

Tue Jan 30 21:18:10 CET 2007

Dear Matrix community,

We are writing to announce the release of a new message-free version 
of the Grammar Matrix, as agreed at the last DELPH-IN annual meeting
in Norway.  The new Matrix is available from
http://www.delph-in.net/matrix/customize/matrix.cgi, as of 1/22/07.
This message details the new representation of illocutionary force (or
'sentence force'), how we believe such information should canonically
be provided in the grammars, and some advice on migrating to the new
system.  We are particularly interested in feedback as we make this
switch.  Please let us know if there are phenomena in the language
you are working on which appear to be incompatible with the analyses
outlined below.

1. New representations
----------------------

We have removed the feature MSG, the type message and all of its
subtypes.  In their place, we have posited a new feature SF (which is
a feature of the type 'event'), which takes an atomic feature
structure of type 'iforce' as its value.

iforce := avm.
prop-or-ques := iforce.
prop := prop-or-ques.
ques := prop-or-ques.
comm := iforce.

event := event-or-ref-index &
  [ E tam,
    SF iforce ].

The type prop-or-ques represents underspecification between
propositions and questions.  This type is useful on the one hand with
verbs like 'know' which select for a clausal complement that expresses
either a proposition or a question (but not a command), and on the
other hand in the analysis of so-called 'intonation questions', that
is, sentences which, from their syntactic form only, could be used to
express either questions or propositions.

An associated change has to do with clausal arguments of various
predicates.  In the past, these ARGn positions ('holes') have been
associated directly with the labels of the message relations belonging
to the embedded clause.  The message's MARG was then related to the
LTOP (typically the label of the embedded verb, or of any scopal
modifiers of that verb) of the embedded clause through a qeq handle
constraint.  This left room for quantifiers to take scope in the lower
clause.  Now that there are no messages, we introduce a qeq between
the ARGn position of the embedded predicate and the LTOP of the
embedded clause.

2. Sentence force in compositional semantics
--------------------------------------------

The move from messages to SF simplifies things in some ways while
adding complexity in others.  The primary source of difficulty is
the fact that, as a feature of events, the value of SF can't be changed
as trees are constructed, only refined (further specified).  Our
goals are:

-- To have every clause which is syntactically marked as
a proposition be [SF prop],
-- To have every clause which is syntactically marked as 
a question be [SF ques],
-- To have every clause which is syntactically ambiguous
between a question reading and a proposition reading be
[SF prop-or-ques],
-- To have every clause which is syntactically marked as
a command be [SF comm], and
-- To allow clause embedding predicates the possibility to
further constrain an [SF prop-or-ques] clause to [SF prop]
or [SF ques], as appropriate.

In a language where verbal inflection _unambiguously_ signaled
sentence force, it would be possible to have the lexical rules
constrain the verb's SF.INDEX appropriately.  In most cases, however,
the verb forms (or other morphosyntactic properties, such as
"inversion", see below) canonically used to mark each clause type also
have other uses.  Thus in general, we expect to delay the choice of SF
value until the clause is more nearly complete.  The strategy we adopt
in the Matrix is to associate the satisfaction of the SUBJ requirement
with some constraint (either prop-or-ques or comm) on the value of
SF. Strictly speaking, this does not mean that we are constraining the
SF only when the clause is complete, as there are situations in which
complements attach outside the subject.  Nonetheless, it seems like a
reasonable place to constrain the value, given that we expect the
choice between prop-or-ques and comm at least to be clear at the level
of the rule handling the subject requirement (either realizing the
subject or discharging the subject requirement without any overt
realization).

In addition, the SF value may be further constrained above the level
of the subject rule, for example, by a sentence-final or
sentence-initial question particle (e.g., French 'est-ce que' or
Japanese 'ka'), a similar particle marking the clause as expressing a
proposition (e.g., Japanese 'yo'), or a non-branching construction
sensitive to syntactic properties of the clause.  The question
particles (and similar) can be analyzed like complementizers which
constrain the INDEX.SF value of their complement, while also 'passing
up' that index (the type raise-sem-lex-item may be helpful).  An
example of the kind of non-branching constructions we have in mind is
the yesno rule in the English Resource Grammar (adopted for similar
yes-no question marking strategies in the Matrix customization
system).  The yesno rule takes a daughter which is [HEAD.INV +] (i.e.,
headed by an inverted auxiliary, in the ERG), and constrains the
mother to be [C-CONT.HOOK.INDEX.SF ques].

If that were all, though, the rule would spin: the mother and daughter
would be compatible with each other.  Furthermore, we need to make
sure that the inverted clauses are not interpreted as propositions (in
this context --- there are other contexts which require inversion,
which is why the lexical rule producing the inverted auxiliary doesn't
constrain the SF value right away).  To handle both of these issues,
we take advantage of the feature [MC luk] ('mainclause').  The
inverted auxiliary and the constituents built from it are [MC na] ---
eligible to be neither main nor subordinate clauses.  The yesno rule
takes a daughter that is [MC na] and produces a constituent which is
[MC +].

Finally, we turn to the clause-embedding verbs.  Since the INDEX value
of the embedded clause is 'visible' to the matrix verb, the matrix
verb can constrain the embedded clause's SF value.  This allows verbs
like 'ask' to subcategorize only for questions, 'claim' for
propositions, and 'know' for prop-or-ques.  In some languages (e.g.,
Zulu, apparently), there is no syntactic difference between embedded
propositions and embedded yes-no questions.  In this case, the
selectional constraints from the embedding verb will have the correct
effect of specializing the SF of the embedded clause from prop-or-ques
to prop or ques, as appropriate.

3. Adding in appropriate handle constraints
-------------------------------------------

As mentioned above, the removal of messages requires a shift in how
the handle constraints which allow quantifiers to scope in embedded
clauses are introduced.  In the Matrix, we have made a small reorganization
of the lexical types to allow clause-embedding verbs to introduce
handle constraints (removing no-hcons-lex-item from their parentage) and
added the qeqs into the linking types.  Here is an example:

; That Kim sleeps surprises Sandy.

clausal-first-arg-trans-lex-item := basic-two-arg &
   [ ARG-ST < [ LOCAL.CONT.HOOK.LTOP #larg ],
	      [ LOCAL.CONT.HOOK.INDEX ref-ind & #ind ] >,
     SYNSEM [ LOCAL.CONT.HCONS <! qeq & [ HARG #harg,
					  LARG #larg ] !>,
	      LKEYS.KEYREL [ ARG1 #harg,
			     ARG2 #ind ] ] ].

We are introducing such qeqs for raising and control predicates
as well, with the exception of raising verbs like 'do' which do
not introduce their own relation.  This is an improvement in
consistency over the message-based analysis, where we did not have
messages for the complements of raising verbs.

trans-first-arg-raising-lex-item := basic-two-arg &
   [ ARG-ST < [ LOCAL.CONT.HOOK.INDEX #ind ],
	      [ LOCAL.CONT.HOOK.XARG #ind ] > ].

trans-first-arg-raising-lex-item-1 := trans-first-arg-raising-lex-item & 
   [ ARG-ST < [ ],
	      [ LOCAL.CONT.HOOK.LTOP #larg ] >,
     SYNSEM [ LOCAL.CONT.HCONS <! qeq & 
				[ HARG #harg,
				  LARG #larg ] !>,
	      LKEYS.KEYREL event-relation & 
			       [ ARG1 #harg ]]].

trans-first-arg-raising-lex-item-2 := trans-first-arg-raising-lex-item &
				      raise-sem-lex-item.

4. Advice on migrating to the new system
----------------------------------------

If you have recently configured a starter grammar (before 1/22/07, but
within the past 2 months or so), it would probably work well to upload
your choices file and rerun the script, then integrate the new starter
grammar with your grammar as you have developed it so far.  This should
typically involve:

  * Making a back-up copy of your grammar before you begin.
  * Replacing your old matrix.tdl with the new matrix.tdl
  * Using ediff (within emacs) or another diff utility to discover
  the changes to my_language.tdl (your language-specific types
  file) and roots.tdl.  Changes may originate in the matrix update
  or in your own grammar development, so consider each difference
  carefully and edit the files appropriately.

For those of you who have been working with the Matrix longer, the
integration may be more difficult (though we are happy to report that
Scott Drellishak discovered relatively few changes of import between a
version of the Matrix from 2004 and the current version in updating a
small grammar for Armenian).  The cleanest way to get an up-to-date
version of the Matrix is to run the configuration script.  To get
a starter grammar, you minimally need fill out the language and word
order sections, as well as the two noun and two verb entries in the
basic lexicon section.

The configuration page is here:

http://www.delph-in.net/matrix/customize/matrix.cgi

Note that if you are updating from an older version of the matrix,
you will need to add the file head-types.tdl (and adapt the script
file lkb/script appropriately).  Note also that the current Matrix
uses the :+ extension to tdl syntax, which is as yet not supported
in PET.

No matter which version of the Matrix you are migrating from,
we encourage you to use this list (matrix at delph-in.net) for assistance.

-- Emily & Dan