[developers] preprocessor implementations
Woodley Packard
sweaglesw at sweaglesw.org
Thu Nov 27 19:40:29 CET 2008
Richard,
I worked on a project a few years ago that included a C implementation
of FSPP as a component. It doesn't support the (new?) '+' and '-' rule
types, and I haven't tested it against the most recent ERG preprocessor
files, but it works quite nicely on the latest version I have handy.
I'm not sure exactly how it compares against the Lisp or Perl
implementations in terms of speed. In a quick test I benchmarked it at
around 35,000 words per second on a modern machine. It doesn't use lex
or compile the rules into machine code -- it just uses the standard C
regex library. Here's a link if you are interested:
http://sweaglesw.org/fspp/
Good luck,
Woodley Packard
R. Bergmair wrote:
> Hi!
>
> I was wondering if there are any implementations of
> preprocessors out there running the ERG FSR rules,
> other than the standard lisp implementation of FSPP
> that comes with the delph-in distribution.
>
> Or actually, I'd be happy to get any tokenizer that
> produces ERG-compatible tokenizations.
>
> For example, I have a vague recollection, that Rebecca
> reimplemented FSPP in Perl.
>
> The reason I'm asking is because I'm preparing to run
> some ERG-compatible tokenization on a really large
> scale, and I'm wondering if there are implementations
> out there that are more efficient than the lisp
> implementation.
>
> Is there a C implementation? (I understand the PET
> fspp is just an ECL binding for the lisp code? Or
> is there an independent implementation as well?)
>
> Has anyone ever tried doing this with lex in C, or
> tried compiling the rules into finite state automata,
> and ultimately machine executable code?
>
> regards,
>
> Richard
More information about the developers
mailing list