[developers] pyDelphin / [incr tsdb()] question
Alexandre Rademaker
arademaker at gmail.com
Thu Apr 11 19:30:53 CEST 2019
Hi Stephan,
> On 11 Apr 2019, at 13:52, Stephan Oepen <oe at ifi.uio.no> wrote:
>
> hi emily and mike,
>
> the [incr tsdb()] import facilities support some mixed-content document formats, notably the un*x mbox format (you can guess when that was useful functionality).
Sorry, I don’t! Do you mean that mbox files can be imported to profiles directly? So far, I always thought that a profile itens are all sentences or phrases subject to be analysed by a grammar.
> to represent all data in the profile while not pretending that there is linguistic content (worth sending to the parser) in email headers, the corresponding items are marked as i-length = -1 (or maybe 0, not quite sure). this is the reason for the ‘Process | ...’ commands to require a non-zero, positive length ... in other words a reassurance that there actually is linguistic content in the item. in this regard, i-length (like i-id, i-input, and possibly i-wf as well) is a mandatory field in the item relation.
So i-length has two meanings, it is at the same time the length of the input (in tokens) but also a flag. The -1 has special meaning, right?
That is, what you are saying is that a profile can also accommodate noise data and we can explicit use the i-length to mark what itens are relevant for processing. Is that right?
Best,
Alexandre
More information about the developers
mailing list