<div><div dir="auto">well, upon further reflection: those values also matter when you move on to analyzing your profile, or to comparing to another one: the default aggregation will be by i-length intervals, and in my experience at least breaking results down that way can help identify relevant trends.</div></div><div dir="auto"><br></div><div dir="auto">cheers, oe</div><div dir="auto"><br></div><div dir="auto"><br></div><div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 11 Apr 2019 at 19:29 Emily M. Bender <<a href="mailto:ebender@uw.edu">ebender@uw.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks, Stephan! That explains why I have a vague memory of it not mattering whether the number in that field is actually accurate...</div><div dir="ltr"><div><br></div><div>Emily</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 11, 2019 at 9:53 AM Stephan Oepen <<a href="mailto:oe@ifi.uio.no" target="_blank">oe@ifi.uio.no</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div dir="auto">hi emily and mike,</div><div dir="auto"><br></div><div dir="auto">the [incr tsdb()] import facilities support some mixed-content document formats, notably the un*x mbox format (you can guess when that was useful functionality). to represent all data in the profile while not pretending that there is linguistic content (worth sending to the parser) in email headers, the corresponding items are marked as i-length = -1 (or maybe 0, not quite sure). this is the reason for the ‘Process | ...’ commands to require a non-zero, positive length ... in other words a reassurance that there actually is linguistic content in the item. in this regard, i-length (like i-id, i-input, and possibly i-wf as well) is a mandatory field in the item relation.</div></div><div dir="auto"><br></div><div dir="auto">best wishes, oe</div><div dir="auto"><br></div><div dir="auto"><br></div><div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 11 Apr 2019 at 05:39 <a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a> <<a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Hi Emily,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">By changing the i-length field from -1 to 6 in the two items, I get [incr tsdb()] to say the following:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> retrieve(): found 2 items (0 output specifications).<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">So I suspect that is the problem. Since i-length is just the number of tokens I find it odd that [incr tsdb()] would rely on that instead of just counting tokens. Processing with art or PyDelphin and ACE works fine regardless of the value of i-length.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">I would modify Xigt's exporter to insert the number of tokens in the i-length field. E.g., at the end of export_igt() put something like this before the return statement:<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> row['i-length'] = len(row['i-input'].split())<br><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 11, 2019 at 11:06 AM Emily M. Bender <<a href="mailto:ebender@uw.edu" target="_blank">ebender@uw.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Dear colleagues,<div><br></div><div>I've run into a mysterious issue with the item files I've generated from Xigt corpora, using an export script that in turn relies on pyDelphin: When I use them to create [incr tsdb()] skeletons, I end up with profiles that are partially but not fully functional: Browse | Items works, for example, but when I try Process | All items or try clicking on an individual item to process, I get just this message in the emacs buffer:</div><div><br></div><div>retrieve(): found 0 items (0 output specifications)</div><div><br></div><div>I suspect that this is because there's something missing or odd about the item files, but comparing by hand to one that works (generated by Import | Test items from within the [incr tsdb()] podium), I can't spot the difference.</div><div><br></div><div>I've attached a small, two-line example that should reproduce the problem, in case anyone has a moment to take a look. (I'm not sending a grammar for this language, since the behavior can be observed with any grammar loaded into the LKB --- the point of failure is before any parsing happens.)</div><div><br></div><div>Thanks!</div><div>Emily</div><div><br>-- <br><div dir="ltr" class="m_-4948004291793558285gmail-m_-3233431324326956236m_-6075918297974946971gmail-m_327746291098416481m_-7662908352448771192gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr">Emily M. Bender<br>Professor, <span style="font-size:12.8px">Department of Linguistics</span></div><div><span style="font-size:12.8px">University of Washington</span></div><div>Twitter: @emilymbender</div></div></div></div></div></div></div></div></div></div></div></div> </blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="m_-4948004291793558285gmail-m_-3233431324326956236m_-6075918297974946971gmail_signature">-Michael Wayne Goodman</div> </blockquote></div></div> </blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="m_-4948004291793558285gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr">Emily M. Bender<br>Professor, <span style="font-size:12.8px">Department of Linguistics</span></div><div><span style="font-size:12.8px">University of Washington</span></div><div>Twitter: @emilymbender</div></div></div></div></div></div></div></div></div></div> </blockquote></div></div>