[developers] DOS/*nix issue with irregular morphological forms in the LKB
Stephan Oepen
oe at ifi.uio.no
Sat Mar 12 13:33:19 CET 2016
that is an interesting constellation, indeed :-).
when downloading via SVN to windows, the line-ending conventions are
helpfully updated: the file is unix-style (LF) natively, but that is
padded to windows-style (CRLF) by the SVN client in your set-up. with
the LKB running in a un*x environment while reading data from the
windows filesystem, this problem arises.
we could mark the file as binary in SVN, to prevent the conversion.
but probably it would be better and more robust to add something like
(string-right-trim '(#\Return) ...)
to the code that reads those strings read from ‘irregs.tab’. since
you have the testing environment readily available, i would like to
defer to you to actually put that into the LKB code.
cheers, oe
On Sat, Mar 12, 2016 at 1:05 PM, Ann Copestake <aac10 at cl.cam.ac.uk> wrote:
> bit of a blast from the past, but I thought it worth recording, since I
> might even get round to doing the fix one day
>
> If one uses the extremely useful UbuntuLKB/Virtual box under Windows with an
> ERG (and presumably other grammars) downloaded from Windows (in my case via
> Tortoise svn), one should be aware that reading in of the irregs.tab file
> may not work properly because of the different line-ending conventions. The
> effect is that a spurious ^M character gets tacked onto the end of the stem
> when morph analysing e.g., slept and so irregular forms are not correctly
> recognised. i.e., the symptom is that one can't parse sentences with a
> morphologically irregular form. The work-around is to save the file in
> *nix format. The solution is to check for this in the LKB when reading the
> irregs.tab file, which is anyway in a stupid format, but I guess there's no
> enthusiasm for changing that now.
>
> Ann
>
More information about the developers
mailing list