[developers] Fwd: Question on using PET/ACE for parsing

Tue Jun 12 00:39:29 CEST 2018

Hello developers,

See the forwarded message below for a question about parsing unknowns using
the ERG, asked by Johnny Wei at the University of Massachusetts, Amherst.

Johnny: others on the list are more qualified to talk about parsing unknown
tokens using ACE or PET with the ERG, but I'll attempt a response:

Parsing unknowns with DELPH-IN grammars is generally the task of matching
tokens that couldn't be analyzed to a defined lexical entry (i.e., "lexical
gaps") to some generic lexical entry. To avoid the explosion of ambiguity
caused by attempting every generic lexical entry for every gap, filters are
used to block some generic entries. One such filter, which it seems you are
aware of, is the TNT POS tags assigned to the unknown token. These tags can
be assigned using a trained POS tagger which is employed by PET or ACE
during the parsing process, or they can be passed in via structured input
to the parser (e.g., "yy-tokens"). In both of these cases, the POS tag is
paired with the input token. What your language model is outputting looks
like predicate symbols, and I'm not sure how to use those to directly
influence the parser, but others on this list might. Also see this wiki
page for more information: http://moin.delph-in.net/PetInput

There are also other methods of robust parsing, such as a PCFG backoff
("csaw"), but maybe these are not what you're looking for right now.

Also note that, in addition to PET and ACE, the LKB system can parse using
DELPH-IN grammars, and it has a bit more robust support for unknown tokens
(e.g., regarding morphological inflection of unknowns), although its
Lisp-based implementation can make it tricky to interface with external
programs, and it tends to run a bit slower than the so-called "efficient
implementations" (but work is being done on improving the Lisp code's
performance).

 i hope this helps!

---------- Forwarded message ----------
From: Michael Wayne Goodman <goodman.m.w at gmail.com>
Date: Mon, Jun 11, 2018 at 1:39 PM
Subject: Fwd: Question on using PET/ACE for parsing
To: goodmami at uw.edu

---------- Forwarded message ----------
From: Johnny Wei <jwei at umass.edu>
Subject: Question on using PET/ACE for parsing
Date: Mon, 11 Jun 2018 14:46:45 -0400
To: goodman.m.w at gmail.com

Dear Michael,

My name is Johnny Wei, an undergraduate from the University of
Massachusetts, Amherst. Deep grammars are very interesting to me, and I am
looking to use the ERG with PET/ACE for parsing language model output. I
have a few questions on parsing I was wondering whether you could answer.
The questions are below and I really would appreciate your help!

To my understanding there are two ways that ERG handles unknown words,
using TNT POS tags and certain regex matching for classes. Is this correct?
The way I have my language model set up is that it can generate a
''JJ_u_unknown" or "_generic_proper_ne" for each of the unknown word
classe! s. To parse, what would be the easiest way to proceed? For some of
the generic classes, I have been able to replace them with some word such
as card_ne -> 9, but I do not know of an easy way to incorporate the part
of speech unknown words.

Again, I really appreciate your help. If anything is not clear please let
me know, thanks!

-- 
Johnny Wei

-- 
Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180611/ce846337/attachment.html>