[developers] lkb/ace difference on generation
Glenn Slayden
glenn at thai-language.com
Fri Feb 20 23:46:46 CET 2015
Indeed, concurring with David, I propose we codify some standards for the DELPH-IN JRF (Joint Reference Formalism) regarding these pesky trivialities.
At issue here is not only the line-feed format…
DOS \r\n
Unix \n
Mac (deprecated) \r
mixture/broken \n\r, etc.
…but also the matter of BOM <http://en.wikipedia.org/wiki/Byte_order_mark> (byte-order mark), versus lack thereof, in UTF-8 and/or wide-character (Unicode, UTF-16, UTF-32) files.
Ideally, we would certify our tools as properly producing, accepting and interpreting any of the above, in combination, for all of the various textual grammar definition (TDL) and administrative/control files (VPM, SEMI, SET).
Glenn
From: developers-bounces at emmtee.net [mailto:developers-bounces at emmtee.net] On Behalf Of David Brodbeck
Sent: Friday, February 20, 2015 1:15 PM
To: Woodley Packard
Cc: developers; Sanghoun Song
Subject: Re: [developers] lkb/ace difference on generation
This is a good detail for students to learn, anyway, because they *will* encounter compilers and other systems later that are picky about line endings. I've seen more than one student beat their head against a problem for hours that I was able to solve for them with dos2unix.
On Thu, Feb 19, 2015 at 9:07 PM, Woodley Packard <sweaglesw at sweaglesw.org> wrote:
Hi Emily,
Thanks Sanghoun for checking the behavior at your end.
The problem, as Sanghoun guessed, is that the VPM is not kicking in. You can see that in the MRS that came out of parsing, the variables have types like "ref-ind" and "event," rather than "x" and "e," with the exception of the LTOP, which gets "h." On the way back in for generation, the "ref-ind" etc are types that the system knows about but "h" is not. The reason the LTOP got "h" is that it is manufactured (using the "handle-type" configuration parameter) instead of read off the AVM. The configuration setting is correct; you normally want "h" rather than "handle," but it wasn't felicitous when the VPM was broken.
I think if you look carefully, you will find an error message in the grammar compilation log that looks like:
vpm: syntax error on line 10, 1/1 context but 1/-1 rule
Unfortunately, in ACE 0.9.17 that was a non-fatal error, so it is easy to overlook. In ACE 0.9.19 it is fatal, so it was easy for Sanghoun to spot. The reason for the syntax error is the DOS line endings. Probably the system should be a bit more flexible about accepting non-UNIX line endings (I guess LKB didn't have any trouble), but that's how it is for right now.
Hope that helps,
-Woodley
On Feb 19, 2015, at 7:19 PM, Sanghoun Song <sanghoun at uw.edu> wrote:
Dear Emily,
On my machine, I could get the generation results as follows.
$ echo "mʔ-piri-ɣʔe-n" | ace -g ckt.dat | ace -g ckt.dat -e
NOTE: 1 readings, added 41 / 34 edges to chart (34 fully instantiated, 2 actives used, 2 passives used) RAM: 539k
NOTE: parsed 1 / 1 sentences, avg 539k, time 0.02247s
Mʔ-piri-n
Mʔ-piri-ɣʔe-n
Mʔ-piri-ɣʔi-n
NOTE: 322 passive, 119 active edges in final generation chart; built 579 passives total. [3 results]
NOTE: generated 1 / 1 sentences, avg 2311k, time 0.35266s
NOTE: transfer did 0 successful unifies and 0 failed ones
One reason might be the difference in ACE version. Your ckt.dat was compiled by 0.9.17, and my ACE engine is ver. 0.9.19.
Another reason that I can think of is the carriage return in semi.vpm. When I compile the data file on my machine, I couldn't at the beginning. It was because semi.vpm had the Windows carriage return. I did the following command, and then could use the grammar.
$ dos2unix semi.vpm
Sanghoun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20150220/4ed72369/attachment-0001.html>
More information about the developers
mailing list