[developers] pyDelphin / [incr tsdb()] question

Alexandre Rademaker arademaker at gmail.com
Thu Apr 11 19:30:53 CEST 2019


Hi Stephan,

> On 11 Apr 2019, at 13:52, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi emily and mike,
> 
> the [incr tsdb()] import facilities support some mixed-content document formats, notably the un*x mbox format (you can guess when that was useful functionality).

Sorry, I don’t! Do you mean that mbox files can be imported to profiles directly? So far, I always thought that a profile itens are all sentences or phrases subject to be analysed by a grammar.

>  to represent all data in the profile while not pretending that there is linguistic content (worth sending to the parser) in email headers, the corresponding items are marked as i-length = -1 (or maybe 0, not quite sure).  this is the reason for the ‘Process | ...’ commands to require a non-zero, positive length ... in other words a reassurance that there actually is linguistic content in the item.  in this regard, i-length (like i-id, i-input, and possibly i-wf as well) is a mandatory field in the item relation.

So i-length has two meanings, it is at the same time the length of the input (in tokens) but also a flag. The -1 has special meaning, right? 

That is, what you are saying is that a profile can also accommodate noise data and we can explicit use the i-length to mark what itens are relevant for processing. Is that right?

Best,
Alexandre





More information about the developers mailing list