[developers] DOS/*nix issue with irregular morphological forms in the LKB
Ann Copestake
aac10 at cl.cam.ac.uk
Sat Mar 12 13:53:07 CET 2016
regardless of svn it can happen (I assume) when someone creates an
irregs.tab file under Windows
and sure, I will look at it at some point. I am tempted to make the "
optional
(that's what I meant by the horrible format - the requirement to have
these at the beginning and
end of the file) and read it in more robustly, though.
On 12/03/2016 12:33, Stephan Oepen wrote:
> that is an interesting constellation, indeed :-).
>
> when downloading via SVN to windows, the line-ending conventions are
> helpfully updated: the file is unix-style (LF) natively, but that is
> padded to windows-style (CRLF) by the SVN client in your set-up. with
> the LKB running in a un*x environment while reading data from the
> windows filesystem, this problem arises.
>
> we could mark the file as binary in SVN, to prevent the conversion.
> but probably it would be better and more robust to add something like
>
> (string-right-trim '(#\Return) ...)
>
> to the code that reads those strings read from ‘irregs.tab’. since
> you have the testing environment readily available, i would like to
> defer to you to actually put that into the LKB code.
>
> cheers, oe
>
>
> On Sat, Mar 12, 2016 at 1:05 PM, Ann Copestake <aac10 at cl.cam.ac.uk> wrote:
>> bit of a blast from the past, but I thought it worth recording, since I
>> might even get round to doing the fix one day
>>
>> If one uses the extremely useful UbuntuLKB/Virtual box under Windows with an
>> ERG (and presumably other grammars) downloaded from Windows (in my case via
>> Tortoise svn), one should be aware that reading in of the irregs.tab file
>> may not work properly because of the different line-ending conventions. The
>> effect is that a spurious ^M character gets tacked onto the end of the stem
>> when morph analysing e.g., slept and so irregular forms are not correctly
>> recognised. i.e., the symptom is that one can't parse sentences with a
>> morphologically irregular form. The work-around is to save the file in
>> *nix format. The solution is to check for this in the LKB when reading the
>> irregs.tab file, which is anyway in a stupid format, but I guess there's no
>> enthusiasm for changing that now.
>>
>> Ann
>>
More information about the developers
mailing list