[developers] sentence splitting

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Wed May 9 14:25:00 CEST 2007

is anyone using an XML-aware sentence splitter? By this I mean something that 
can be given XML marked up text and parameterized to put sentence boundaries 
in `sensible' places given the markup.

e.g., with a file containing:

This is sentence is partly in <IT>italics.</IT>

we would like the sentence boundary to be inserted after the </IT>

I am aware there are lots of issues in doing this properly, but leads would be 
appreciated.  As usual, we're interested in solutions that would be generally 
available, preferably Open Source, not proprietary software.


