[developers] Parsing with ACE

Petter Haugereid petterha at gmail.com
Fri Jun 26 09:07:51 CEST 2015


Thanks a lot! I am now an ACE user.

On Thu, Jun 25, 2015 at 12:19 AM, Woodley Packard <sweaglesw at sweaglesw.org>
wrote:

> Hi Petter,
>
> I notice "format error: unknown type `+’." in the grammar loading log.
> There’s nothing to say where that’s coming from, but in fact it refers to
> line 53 of rpp/lkb.rpp where a rule starts with '+' when ACE ungenerously
> believes it ought to start with '!'.
>
> The next problem I found is that lexemes have no TOKENS feature.  This
> feature is introduced on the type `word’ by a type addendum in tmt.tdl, but
> lexemes do not inherit from `word’.  With a token-aware workflow, the
> output of the token mapping phase is unified into the TOKENS feature of
> lexemes; when that feature is missing / not appropriate, it is an
> unexpected situation.
>
> Additionally, the token mapping rule "generic_name_tmr" is defaulting all
> tokens to [ +TRAIT: generic_trait ], which means they are incompatible with
> native lexical entries.  Since there are no POS tags, the generic lexical
> entries are also incompatible, so you get no lexemes and no parse.
>
> Finally, the tiny-lex.tdl lexicon has a start-of-string lexeme whose
> orthography is "START" rather than "^", which makes it unable to match the
> "^" introduced by the REPP rules.
>
> I took the liberty of changing tmt.tdl to introduce TOKENS and the
> accompanying constraints on word-or-lexrule instead of word, commenting out
> generic_name_tmr, and rewriting START to ^ in tiny-lex.tdl.  With these
> changes I can parse "Jon sover" and get a plausible-looking MRS out.
>
> I hope that is helpful advice,
> -Woodley
>
> > On Jun 24, 2015, at 5:36 AM, Petter Haugereid <petterha at gmail.com>
> wrote:
> >
> > Hi,
> >
> > I am trying to load my Norwegian grammar into ACE, but I run into some
> issues when I try to parse a sentence.
> >
> > Loading the grammar seems to go fine (the config file is based on that
> of Jacy):
> >
> > petter at tor:~/tools/ace-0.9.21$ ./ace -G norsyg.dat -g
> ../../logon/petter/norsyg/ace/config.tdl
> > reading configuration       from
> `../../logon/petter/norsyg/ace/config.tdl'
> > reading instance            from
> `../../logon/petter/norsyg/ace/../pet/qc.tdl'
> > reading types               from
> `../../logon/petter/norsyg/ace/../mtr.tdl'
> > grammar version             Norsyg (1206)
> > format error: unknown type `+'.
> > reading grammar             from
> `../../logon/petter/norsyg/ace/../norwegian.tdl'
> > reading lexical-filtering-rulefrom
> `../../logon/petter/norsyg/ace/../lfr.tdl'
> > reading types               from
> `../../logon/petter/norsyg/ace/../matrix.tdl'
> > reading types               from
> `../../logon/petter/norsyg/ace/../nor.tdl'
> > reading types               from
> `../../logon/petter/norsyg/ace/../infl-codes.tdl'
> > reading types               from
> `../../logon/petter/norsyg/ace/../tmt.tdl'
> > reading types               from
> `../../logon/petter/norsyg/ace/../unknown.tdl'
> > reading lexical entries     from
> `../../logon/petter/norsyg/ace/../tiny-lex.tdl'
> > reading token-mapping-rule  from
> `../../logon/petter/norsyg/ace/../tmr/prelude.tdl'
> > reading token-mapping-rule  from
> `../../logon/petter/norsyg/ace/../tmr/pos.tdl'
> > reading token-mapping-rule  from
> `../../logon/petter/norsyg/ace/../tmr/pos-ipa.tdl'
> > reading token-mapping-rule  from
> `../../logon/petter/norsyg/ace/../tmr/finis.tdl'
> > reading generic-lex-entry   from
> `../../logon/petter/norsyg/ace/../gle.tdl'
> > reading rules               from
> `../../logon/petter/norsyg/ace/../rules.tdl'
> > reading lexical rules       from
> `../../logon/petter/norsyg/ace/../tiny-irules.tdl'
> > reading instance            from
> `../../logon/petter/norsyg/ace/../labels.tdl'
> > reading instance            from
> `../../logon/petter/norsyg/ace/../roots.tdl'
> > checking for glbs...        0.53 sec
> > processing constraints...   0.67 sec
> > processing rules            35 ms
> > processing lex-rules        0 ms
> > reading irregular forms     from ../irregs.tab
> > processing lexicon...       1 ms
> > simple lexemes              0 / 3 = 0.00%
> > 3336 types (1501 glb), 3 lexemes, 77 rules, 1 orules, 983 instances, 722
> strings, 234 features
> > loading maxent model        0 ms
> > reading tree labels         from
> `../../logon/petter/norsyg/ace/../labels.tdl'
> > loading tree-node-labels
> > rule filter...              83.3% blocked (39.1% ss)
> > rule filter...              83.3% blocked (39.0% ss)
> > rule filter...              83.3% blocked (39.0% ss)
> > rf-transitive closure...    1 ms
> > loaded grammar in 2.41391s
> >  types: 33.9M rules: 8.4M lex-info: 500
> >  miscellaneous: 62K lex-dgs: 71K miscellaneous: 13.7M sem-index: 85K
> stochastic-model: 0 latmap rules: 18K
> >  ... freezing 55.8M to file map 0x6000000000
> >
> >
> > But when I try to parse the sentence "Jon sover", I get an error message:
> >
> > petter at tor:~/tools/ace-0.9.21$ ./ace -g norsyg.dat -Tf1
> > Jon sover
> > ERROR: toklist or toklast missing on a token
> > NOTE: lexemes do not span position 0 `^'!
> > NOTE: post reduction gap
> > SKIP: Jon sover
> > NOTE: ignoring `Jon sover'
> >
> > It should be noted that I use REPP to add "^ " at the beginning of every
> input string, so the string the grammar attempts to parse is "^ Jon sover".
> ("^" has a lexical entry.)
> > I don't quite understand the meaning of the ERROR message. I have tried
> to find out if there are any TOKENS features that are missing in the
> grammar, but I don't know what is expected of the grammar. I am attaching a
> stripped down version of the grammar in case anyone would like to try to
> find out what goes wrong. (The config file is in ace/.)
> >
> > Best regards,
> >
> > Petter
> > <norsyg_2015-06-24.tgz>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20150626/e865f2b5/attachment.html>


More information about the developers mailing list