[developers] Fwd: PET/ERG/MRS and more

Dan Flickinger danf at stanford.edu
Wed Apr 8 19:58:22 CEST 2015


[I forgot to cc developers in my reply to Ping, and thanks, Stephan, for the in-parallel response.  I'll check later today to see if our stories match :).]

----- Forwarded Message -----
From: "Dan Flickinger" <danf at stanford.edu>
To: "Ping Xue" <ping.xue at boeing.com>
Sent: Wednesday, April 8, 2015 9:13:53 AM
Subject: Re: PET/ERG/MRS and more

Hi Ping -

It's good to hear that you are making use of the PET/ERG machinery.  Regarding unknown-word handling, I have always used the -default-les=all setting, so I doubt that any other values would be relevant for the ERG.  But you should also add the "-tagger" option, since the unknown word machinery depends in part on getting part-of-speech tags.  And you should also be using the PET options "-repp" (for regular expression preprocessin) and "-cm" (for chart mapping), to enable the full range of text-normalizing steps that the ERG expects from preprocessing.  You probably already know that you should also use the "-packing" option (for parsing efficiency).  So the standard invocation that I use for PET with the ERG is as follows:

cheap -cm -repp -tagger -default-les=all -packing english.grm

Depending on your computing resources, you might also want to specify limits on memory and time that PET uses for any one sentence, by setting appropriate values for these additional options:
-memlimit=2048 (2 gigabytes for parse chart construction) 
-timeout=60 (60 CPU seconds)

If you can describe the kinds of additional constraints on unknown word handling that you would like to use, I'll be glad to point you to ways to accommodate them.  But our recent experience with quite a few different text genres suggests that the built-in settings go pretty far, so perhaps with the above settings, you won't need to do a great deal more for unknown words.

As for documentation on the notation for ERG outputs, I would suggest starting with the links on this page:
http://moin.delph-in.net/ErgTop
and in particular the link to Semantics, where you'll see documentation that is slowly but steadily growing.  The symbols used in the derivation trees are documented at the link for "Syntactic and Lexical Rules".  And you can see examples of the lexical types and rules in the annotated corpus examples on the "Linguistic Type Database" pages.

You may also find useful discussions elsewhere on the DELPH-IN wiki pages, starting at 
moin.delph-in.net

I hope these help.  And yes, I will look forward to catching up with you next time I am up in Seattle, perhaps in early June.

Cheers,

 Dan


----- Original Message -----
From: "Ping Xue" <ping.xue at boeing.com>
To: "Dan Flickinger" <danf at stanford.edu>
Sent: Tuesday, April 7, 2015 3:13:36 PM
Subject: PET/ERG/MRS and more

Hi Dan,

Hope this finds you well. As I might have mentioned to you before, we (Boeing Research) have been using PET/ERG system to support a collaborative research project with IBM UK. I am looking at the parse trees (and the MRS representations generated by the PET/ERG system, trying to do some grammar engineering to cover our special data. One of the things is that we have to allow "the unknown word handling". We used the command-line option -default-les=all. I wonder what the other allowable values are for this option. While allowing "unknown word handling",  we hope to constrain it in some way. I would appreciate it if you could give me some insights or point me to some documentation. I would also appreciate detailed documentation about the notations/symbols/abbreviated expressions<http://en.wikipedia.org/wiki/Expression_%28language%29> that are used in the output of the PET/ERG system, namely notations used in the parse trees, MRS representations, etc. generated by the system. I can guess quite a bit but not all.

I am currently still with Boeing trying to finish the projects before I can leave Boeing. I haven't forgotten what we talked about before the end of last year. In case you would come to Seattle sometime, please let me know. I will be very interested to know more about your language arts courses and the current state of the system.

Best regards,

Ping

Ping Xue
Boeing Research & Technology
PO Box 3707 MC 7L-43
Seattle, WA 98124-2207
(425) 373-2861





More information about the developers mailing list