Generating in ERG

Fri May 21 20:31:59 CEST 2004

Hi LKB-listers,

Four months ago, I asked a question on this list about generation in
the LKB. I didn't reply to your request for more detail, since I had
to leave the project for 3 months. But here it comes. First a short
version and then a long version, if you are interested in all (or
rather a little bit of) the nitty-gritty background detail. You do not
have to read the last to respond to the first.

THE SHORT:

I need to pipe structures from a non-LKB, non-HPSG (non-the most)
program, to be generated by the ERG-grammar, as part of a
machine-translation system. I don't know which of the different MRSs
will be the best (or the real difference between them), but I'm
working in Prolog (and/or Perl), doing MT and not that interested in
the scoping bit.

I do not know Lisp (above the 'phrase-book level').

THE LONG:

This is the final part of my PhD-project, and having played around
with the LKB/ERG (and a lot of other things, delaying my project quite
a bit), I am starting to get an inkling of a suspicion, that the task
is too demanding of LKB (and me!). So please feel free to say: 'Can't
be done! Forget it! (Twit!)' if you feel so.

I'm trying to build a machine translation system on top of the tagger
here at University of Southern Denmark (http://visl.sdu.dk). The
VISL-tagger is a Constraint Grammar (not Constraint-Based Grammar!)
tagger. It gives, not only POS, morphology etc., but also syntactic
role (see the homepage for a demo in Danish/English a.o). The VISL
tagger is built to handle any Danish (or English etc.) sentence as
input, and actally works with only a few percent errors. This I pipe
into a program extracting the dependency structure (demo at homepage,
only Danish). This I am going to pipe into a Shake'n'Bake-type transfer
and then generate English sentences using the ERG.

My succes-criteria is broad coverage (any sentence is translated)
rather than quality (correctness, grammaticality or even
comprehensibility) or time. I want to show that it can be done, rather
than actually having a 'product'.

I've chosen the ERG, as it is in a HPSG-style notation, which I am
rather familiar with, and because it is the one I know of with the
broadest lexical and grammatical coverage. I plan to make some kind of
'lexicon'-transfer so that words (or valencies) not in the ERG lexicon
are guessed at, on the basis of the valency of the Danish word. (I
have pairwise lists of Danish and English lemmas (or ORTHS in
LKB-lingo) to handle that bit of the problem).

The qualms I have with LKB/ERG are: (These are not meant as
criticisms, but as lack of perfection, that may be crucial for me).
* It is slow. On my (old) computer it takes a long time to generate
sentences, see structures, index for generation etc. Exploring and
developing the system (using ERG) is 10% writing and 90% waiting.
* It is not sufficiently complex. It may not actually be capable of
generating the constructions, I feed into it. An example (the only
sentence I've tested it on): It analyzes 'During 1993
they will market workstations, working from a common standard.' (with
extra lex.items), but cannot generate.
* It is too complex (the transfer stucture is to deep). I intended it
as a kind of shake'n'bake lexical transfer; it may be more like a full
conceptual representation interlingua. 
* It is too complex ("I'm too stupid"): The ERG (and the MRSs it
produces) seems to use notations quite far beyond the literature
(AC-02:Impl.TFSG, ACet.al.-99:MRS-Intro, Pollard/Sag-94, Sag/Wasow,
Sag/Ginzburg etc.).

Any suggestions to whether and how my project may be succesful (or just an
answer to the question I started with) are welcome.

Soren Harder
PhD-student, Univ. of Southern Denmark,
sharder at language.sdu.dk