[developers] preprocessor implementations

Woodley Packard sweaglesw at sweaglesw.org
Thu Nov 27 19:40:29 CET 2008


Richard,

I worked on a project a few years ago that included a C implementation 
of FSPP as a component.  It doesn't support the (new?) '+' and '-' rule 
types, and I haven't tested it against the most recent ERG preprocessor 
files, but it works quite nicely on the latest version I have handy.  
I'm not sure exactly how it compares against the Lisp or Perl 
implementations in terms of speed.  In a quick test I benchmarked it at 
around 35,000 words per second on a modern machine.  It doesn't use lex 
or compile the rules into machine code -- it just uses the standard C 
regex library.  Here's a link if you are interested:

http://sweaglesw.org/fspp/

Good luck,
Woodley Packard

R. Bergmair wrote:
> Hi!
>
> I was wondering if there are any implementations of
> preprocessors out there running the ERG FSR rules,
> other than the standard lisp implementation of FSPP
> that comes with the delph-in distribution.
>
> Or actually, I'd be happy to get any tokenizer that
> produces ERG-compatible tokenizations.
>
> For example, I have a vague recollection, that Rebecca
> reimplemented FSPP in Perl.
>
> The reason I'm asking is because I'm preparing to run
> some ERG-compatible tokenization on a really large
> scale, and I'm wondering if there are implementations
> out there that are more efficient than the lisp
> implementation.
>
> Is there a C implementation? (I understand the PET
> fspp is just an ECL binding for the lisp code? Or
> is there an independent implementation as well?)
>
> Has anyone ever tried doing this with lex in C, or
> tried compiling the rules into finite state automata,
> and ultimately machine executable code?
>
> regards,
>
> Richard




More information about the developers mailing list