[developers] DOS/*nix issue with irregular morphological forms in the LKB

Ann Copestake aac10 at cl.cam.ac.uk
Sat Mar 12 13:05:59 CET 2016


bit of a blast from the past, but I thought it worth recording, since I 
might even get round to doing the fix one day

If one uses the extremely useful UbuntuLKB/Virtual box under Windows 
with an ERG (and presumably other grammars) downloaded from Windows (in 
my case via Tortoise svn), one should be aware that reading in of the 
irregs.tab file may not work properly because of the different 
line-ending conventions.  The effect is that a spurious ^M character 
gets tacked onto the end of the stem when morph analysing e.g., slept 
and so irregular forms are not correctly recognised. i.e., the symptom 
is that one can't parse sentences with a morphologically irregular 
form.  The  work-around is to save the file in *nix format.  The 
solution is to check for this in the LKB when reading the irregs.tab 
file, which is anyway in a stupid format, but I guess there's no 
enthusiasm for changing that now.

Ann



More information about the developers mailing list