[developers] DOS/*nix issue with irregular morphological forms in the LKB
Ann Copestake
aac10 at cl.cam.ac.uk
Sat Mar 12 13:05:59 CET 2016
bit of a blast from the past, but I thought it worth recording, since I
might even get round to doing the fix one day
If one uses the extremely useful UbuntuLKB/Virtual box under Windows
with an ERG (and presumably other grammars) downloaded from Windows (in
my case via Tortoise svn), one should be aware that reading in of the
irregs.tab file may not work properly because of the different
line-ending conventions. The effect is that a spurious ^M character
gets tacked onto the end of the stem when morph analysing e.g., slept
and so irregular forms are not correctly recognised. i.e., the symptom
is that one can't parse sentences with a morphologically irregular
form. The work-around is to save the file in *nix format. The
solution is to check for this in the LKB when reading the irregs.tab
file, which is anyway in a stupid format, but I guess there's no
enthusiasm for changing that now.
Ann
More information about the developers
mailing list