[developers] SBCL port
Ben Waldron
bmw20 at cl.cam.ac.uk
Sat Oct 28 18:45:35 CEST 2006
I can now report that the ERG is able to run happily under SBCL. The
memory problem was caused by the CL-PPCRE regex library which is used in
the preprocossor and MT components. Interestingly, the LKB/ERG runs
considerably faster and requires far less memory when running under SBCL
that it does under Allegro. The standard SBCL distribution comes with
512MB dynamic space for its heap, which suffices for the LKB/ERG when
using the fix to the regex code; but it's likely a good idea to compile
your own copy of SBCL with a larger dynamic space. Details below.
=
The CL-PPCRE regex library comes with a parameter
ppcre::*regex-char-code-limit* which provides "[t]he upper exclusive
bound on the char-codes of characters which can occur in character
classes. Change this value BEFORE creating scanners if you don't need
the full Unicode support of LW, ACL, or CLISP." The default if the value
of CHAR-CODE-LIMIT, which depends on the Lisp implementation running;
for Allegro it is 65536 (16-bit Unicode), for SBCL it is 1114112 (full
Unicode). When creating a regex scanner, character classes are mapped to
arrays of this size. Clearly, arrays of size 1114112 quickly eat up
memory! To enable us to use the regex library under SBCL I've checked in
the following fix: set ppcre::*regex-char-code-limit* to 65536 under
SBCL by default. In the preprocessor code I go further and set it to a
minimum sensible value; but the MT code will use my new default (oe, you
will probably want to fix this sometime).
We are currently using version 1.2.3 of CL-PPCRE. The most recent
version is 1.2.18. This incorporates a number of bug fixes (none of
which appear to directly affect our code). I would like to upgrade the
LKB to the latest version. Does anyone have any reason to want to keep
the current version? (tsdb tells me it is safe to upgrade.)
When running the LKB/ERG over the CSLI test suite I get the following
figures: Allegro takes 647.0 CPU seconds (652.3 real time) and requires
458MB memory; SBCL takes 488.2 seconds (490.4 real time) and requires
196MB memory. That is, under SBCL speed is improved by 24.6% and memory
usage by 57.2% (!).
To create a custom build of SBCL with more dynamic space alter
def!constant dynamic-space-end in the SBCL source code and compile. Eg.
--- sbcl-0.9.17/src/compiler/x86/parms.lisp 2006-10-28
18:35:05.000000000 +0200
+++ sbcl-0.9.17/src/compiler/x86/parms.lisp.OLD 2006-10-28
18:34:52.000000000 +0200
@@ -175,7 +175,7 @@
(def!constant static-space-end #x07fff000)
(def!constant dynamic-space-start #x09000000)
- (def!constant dynamic-space-end #x49000000)
+ (def!constant dynamic-space-end #x29000000)
(def!constant linkage-table-space-start #x70000000)
(def!constant linkage-table-space-end #x7ffff000))
- Ben
Ben Waldron wrote:
> the ERG causes the Lisp process to die (below) due to a memory
> problem. I think this can probably be fixed by creating a custom build
> of the SBCL compiler (apparently one should: edit DYNAMIC-SPACE-END in
> src/compiler/.../params.lsp) -- but I've not tried this.
> ===
>
> * (read-script-file-aux "/home/bmw20/erg/lkb/script")
>
> ; in: LAMBDA NIL
> ; (SETF LKB::*TRANSLATE-GRID* '(:EN :EN))
> ; ==>
> ; (SETQ LKB::*TRANSLATE-GRID* '(:EN :EN))
> ;
> ; caught WARNING:
> ; undefined variable: *TRANSLATE-GRID*
>
> ;
> ; caught WARNING:
> ; This variable is undefined:
> ; *TRANSLATE-GRID*
> ;
> ; compilation unit finished
> ; caught 2 WARNING conditions
> STYLE-WARNING: redefining ESTABLISH-LINEAR-PRECEDENCE in DEFUN
> STYLE-WARNING: redefining SPELLING-CHANGE-RULE-P in DEFUN
> STYLE-WARNING: redefining REDUNDANCY-RULE-P in DEFUN
> STYLE-WARNING: redefining HIDE-IN-TYPE-HIERARCHY-P in DEFUN
> STYLE-WARNING: redefining MAKE-UNKNOWN-WORD-SENSE-UNIFICATIONS in DEFUN
> STYLE-WARNING: redefining INSTANTIATE-GENERIC-LEXICAL-ENTRY in DEFUN
> STYLE-WARNING: redefining MAKE-ORTH-TDFS in DEFUN
> STYLE-WARNING: redefining SET-TEMPORARY-LEXICON-FILENAMES in DEFUN
> STYLE-WARNING: redefining DAG-INFLECTED-P in DEFUN
> STYLE-WARNING: redefining BOOL-VALUE-TRUE in DEFUN
> STYLE-WARNING: redefining BOOL-VALUE-FALSE in DEFUN
> STYLE-WARNING: redefining DETERMINE-ARGUMENT-OPTIONALITY in DEFUN
> STYLE-WARNING: redefining DETERMINE-DERIVED-FORMS in DEFUN
> STYLE-WARNING: redefining LUI-CHART-EDGE-NAME in DEFUN
> STYLE-WARNING: redefining CHAIN-DOWN-MARGS in DEFUN
> STYLE-WARNING: redefining IDIOM-REL-P in DEFUN
> set-coding-system(): ignoring request for UTF8 on this Lisp
> implementation.
> Reading in type file fundamentals.tdl
> Reading in type file lextypes.tdl
> Reading in type file syntax.tdl
> Reading in type file lexrules.tdl
> Reading in type file auxverbs.tdl
> Reading in type file mtr.tdl
> Checking type hierarchy
> Checking for unique greatest lower bounds
> Expanding constraints
> Making constraints well formed
> Expanding defaults
> Type file checked successfully
> Computing display ordering
> Reading in cached leaf types
> Cached leaf types read
> Reading in cached lexicon (main)
> Cached lexicon read
> Reading in rules file constructions.tdl
> Reading in lexical rules file inflr.tdl
> Reading in lexical rules file inflr-pnct.tdl
> Reading in root file roots.tdl
> Reading in lexical rules file lexrinst.tdl
> Reading in parse node file parse-nodes.tdl
> Evaluation took:
> 13.495 seconds of real time
> 12.141154 seconds of user run time
> 0.781881 seconds of system run time
> 0 page faults and
> 272,134,856 bytes consed.
> read-vpm(): reading variable property mapping `semi.vpm'.
> read-transfer-rules(): reading file `idioms.mtr'.
> read-transfer-rules(): reading file `generation.mtr'.
> read-transfer-rules(): reading file `trigger.mtr'.
> Argh! gc_find_freeish_pages failed (restart_page), nbytes=4456456.
> Gen Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs
> Mem-age
> 0: 20 0 2178 0 0 8986536 16472 2000000 0 0
> 0.0000
> 1: 28 2 0 13068 17 53589904 59504 2000000 8 0
> 0.4999
> 2: 15185 415 54 71893 21 358134904 457608 360134904 15152
> 1 0.0000
> 3: 1315 196 44 0 74 6299280 70000 2000000 751 0
> 0.0000
> 4: 6343 767 1839 354 132 37900176 204912 2000000 7388 0
> 0.0000
> 5: 0 0 0 0 0 0 0 2000000 0 0
> 0.0000
> 6: 6071 0 0 0 0 24866816 0 2000000 5884 0
> 0.0000
> Total bytes allocated=489777616
> fatal error encountered in SBCL pid 32736(tid 3085371072):
>
>
> The system is too badly corrupted or confused to continue at the Lisp
> level. If the system had been compiled with the SB-LDB feature, we'd drop
> into the LDB low-level debugger now. But there's no LDB in this build, so
> we can't really do anything but just exit, sorry.
More information about the developers
mailing list