[developers] New ERG with improved tokenization/preprocessing for PET

Stephan Oepen oe at ifi.uio.no
Mon Apr 13 09:49:46 CEST 2009

> I think there may be an issue with TnT in that as far as I know TnT is
> not open, so some DELPH-IN members will not be able to use the
> unknown-word handling.  For example, I am fairly sure I don't have a
> local license for tnt, and then you also (probably) need a WSJ license
> for the model (which I do have).  

true, TnT source is not freely available, but binaries are provided
for public download from:


these binaries are also included with the LOGON tree (see LogonTop on
the wiki).  so, as far as i understand it, no need to fax an agreement
to thorsten any longer.

it might be worth looking into truly open-source taggers at some point,
though the current TnT licensing conditions seem suitable for typical

> Does everything apart from the unknown-words work without TnT?

yes, i should think so.

> Let me be the first to say Wheeeeeeeee!

for more background on this (emerging) new treebank, see:


                                               god påske :-)  -  oe

+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---

More information about the developers mailing list