[developers] DOS/*nix issue with irregular morphological forms in the LKB

Ann Copestake aac10 at cl.cam.ac.uk
Sat Mar 12 13:53:07 CET 2016

regardless of svn it can happen (I assume) when someone creates an 
irregs.tab file under Windows

and sure, I will look at it at some point.  I am tempted to make the " 
(that's what I meant by the horrible format - the requirement to have 
these at the beginning and
end of the file) and read it in more robustly, though.

On 12/03/2016 12:33, Stephan Oepen wrote:
> that is an interesting constellation, indeed :-).
> when downloading via SVN to windows, the line-ending conventions are
> helpfully updated: the file is unix-style (LF) natively, but that is
> padded to windows-style (CRLF) by the SVN client in your set-up.  with
> the LKB running in a un*x environment while reading data from the
> windows filesystem, this problem arises.
> we could mark the file as binary in SVN, to prevent the conversion.
> but probably it would be better and more robust to add something like
>    (string-right-trim '(#\Return) ...)
> to the code that reads those strings read from ‘irregs.tab’.  since
> you have the testing environment readily available, i would like to
> defer to you to actually put that into the LKB code.
> cheers, oe
> On Sat, Mar 12, 2016 at 1:05 PM, Ann Copestake <aac10 at cl.cam.ac.uk> wrote:
>> bit of a blast from the past, but I thought it worth recording, since I
>> might even get round to doing the fix one day
>> If one uses the extremely useful UbuntuLKB/Virtual box under Windows with an
>> ERG (and presumably other grammars) downloaded from Windows (in my case via
>> Tortoise svn), one should be aware that reading in of the irregs.tab file
>> may not work properly because of the different line-ending conventions.  The
>> effect is that a spurious ^M character gets tacked onto the end of the stem
>> when morph analysing e.g., slept and so irregular forms are not correctly
>> recognised. i.e., the symptom is that one can't parse sentences with a
>> morphologically irregular form.  The  work-around is to save the file in
>> *nix format.  The solution is to check for this in the LKB when reading the
>> irregs.tab file, which is anyway in a stupid format, but I guess there's no
>> enthusiasm for changing that now.
>> Ann

More information about the developers mailing list