[developers] doc-strings in TDL for DELPH-IN

Dan Flickinger danf at stanford.edu
Tue Aug 28 00:22:45 CEST 2018

Hi all,

Thanks to the developers who have coordinated efforts so smoothly for these doc strings in TDL.  I am happy to report that for each of the following platforms, I have been able to successfully (compile and) load and run the trunk version of the ERG suitably modified to use the new doc-string format for comments on all of the 1250 leaf lexical types in the grammar:

LKB and LKB_FOS -- Using the latest versions available for download from the DELPH-IN LkbInstallation pages, both still require the ERG-supplied patch to the function read-tdl-type-comment() in erg/lkb/patches.lsp, but with this patch, the grammar with new doc-strings loads and runs fine.  It would be nice to have developer-approved versions of this function instead of the patch, since any other grammar employing these new doc-strings will also currently need to include this patched version of the function.

PET -- Using the latest `main' SVN branch, the updated grammar compiles, loads, and runs fine, with one surprising caveat: for some reason, two of the three files containing the new doc-strings (letypes.tdl and auxverbs.tdl) will not compile unless they include a final commented-out line with a particular number of characters.  See the note at the end of each of these files; it would be nice to chase down and correct this hiccup, though it might not be urgent.  Other grammar developers should monitor the behavior with their grammars in the meantime.

ACE -- Using the `trunk' SVN version, the updated grammar compiles, loads, and runs fine.  It would be good to now update the precompiled ACE binary on the ACE home page, so the ERG (and possibly other updated grammars) will work.

I haven't yet checked to see if the latest PyDelphin is happy with this version, but will soon, unless Mike or Angie get there first (once I check in the ERG changes).  I also don't know whether `agree' is ready to accept the new doc-strings.

Next steps:

  1.  I will check in the updated `trunk' ERG, and hope that the ACE binary on that home page will be updated soon, for those who might be using the trunk ERG but not compiling their own ACE.
  2.  It would be good to have the $LOGONROOT/uio/bin/linux.x86.64 binaries for `flop' and `cheap' updated to be consistent with the `main' branch of PET, since the existing ones cannot compile what I will check into the trunk ERG (soon to be stable version "2018").

I'll hold off for a day in checking in the new ERG, in case anyone can foresee a reason for a different sequence of events to get us to a happier future consistent state of the world that embraces the new doc-strings.


Thanks, Dan,

I'll do my part to update the wiki with the preferred syntax and add support into PyDelphin. Regarding the syntax description, it would be rather complicated to enforce one docstring per type in the production rules if it's not in a fixed position, so I'll let it accept multiple per type and just make a note for implementers that only the first one must be preserved (with the action for additional docstrings left undefined).

And, Woodley, good catch on the regex bug. Those patterns should be negative lookahead assertions. I think the following works:

    DocString := /"""([^"\\]|\\.|"(?!")|""(?!"))*"""/

Lookahead assertions can slow down regex searches, so this pattern is intended to be illustrative; a non-regex parser is fine as long as it also allows escaped characters (including quotes) and up to two unescaped quotes not followed by a third quote. Also, if it's not clear from the pattern, newlines are acceptable within the triple-quoted strings.

Hello docstringers,

I have added to ACE the ability to detect and ignore triple-quoted strings anywhere within a TDL statement.  I will leave it to others to determine and police legal placement.  The (very lightly tested) update is available in the ACE SVN trunk for those who wish to test it.  I will be happy to make a binary release soon if bugs are not uncovered.

I have one nit to pick with the proposed regular expression for doc strings.  The following docstring would be treated as terminating early, since the backslash is gobbled up without being interpreted:

"""hello"\"""not done yet"""

This one is legal in python (and handled properly by ACE :-)).


DocString    := /"""([^"\\]|\\.|"[^"]|""[^"])*"""/ Spacing

