<div><div dir="auto">hi paul,</div><div dir="auto"> </div><div dir="auto">from memory, you can request increasing levels of chart mapping logging by passing ‘-cm=n’, using a bit-coded value for ‘n’. i would expect one of 2 or 4 to cause PET to output the final token lattice (prior to morphology and lexical processing), and 4 or 8 to make it dump the lexical lattice (before or after lexical filtering).</div><div dir="auto"> </div><div dir="auto">these lattices should help you debug further. if you end up feeling stuck, i can try and reproduce the issue. from your message so far, it is not yet evident to me that PET is inappropriately strict in testing chart connectivity.</div><div dir="auto"> </div><div dir="auto">best wishes, oe</div><div dir="auto"> </div> <div class="gmail_quote"><div>On Tue, 2 Jan 2018 at 15:47 <<a href="mailto:paul@haleyai.com">paul@haleyai.com</a>> wrote: </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="#0563C1" vlink="#954F72"><div class="m_-1206039265761553095WordSection1">Happy New Year! In the subject phrase, “Cal. App.2d” refers to the California Court of Appeal for the 2nd district. We augment the ERG with the following lexical entry: <ul style="margin-top:0in" type="disc"><li class="m_-1206039265761553095MsoListParagraph" style="margin-left:0in">California_Court_of_Appeal_2nd_District_NNP_abb := n_-_pn_le & [ ORTH < "Cal.","App.","2d" >, SYNSEM [ LKEYS.KEYREL.CARG "CA Court of Appeal, 2nd District", PHON.ONSET con ] ].</li></ul> We do not use the ERG’s assumption that uppercase or capitalized tokens are proper_ne, as in the following token mapping rules, but the case here would hold if the text was all lowercase. We send in the following (via FSC as attached):<ol style="margin-top:0in" start="1" type="1"><li class="m_-1206039265761553095MsoListParagraph" style="margin-left:0in">Cal</li><li class="m_-1206039265761553095MsoListParagraph" style="margin-left:0in">.</li><li class="m_-1206039265761553095MsoListParagraph" style="margin-left:0in">App</li><li class="m_-1206039265761553095MsoListParagraph" style="margin-left:0in">.</li><li class="m_-1206039265761553095MsoListParagraph" style="margin-left:0in">2d</li></ol> Note that this tokenization splits the period from the preceding letters. The example here would hold for any punctuation split during tokenization and joined by token mapping rules, however (e.g., hypens, commas, ...). We see suffix_punctuation_tmr derive the following tokens:<ul style="margin-top:0in" type="disc"><li class="m_-1206039265761553095MsoListParagraph" style="margin-left:0in">Cal.</li><li class="m_-1206039265761553095MsoListParagraph" style="margin-left:0in">App.</li></ul> PET aborts the parse and informs us that there are no lexicon entries for “Cal”. PET is looking for “unexpanded_items” if the chart is not “connected”, but there seems to be an error in that computation, as shown below. So we modify PET not to abort in this case (but to inform us) and the parse completes, albeit using only the following lexical entry: <ul style="margin-top:0in" type="disc"><li class="m_-1206039265761553095MsoListParagraph" style="margin-left:0in">Califonia_Court_of_Appeal_NNP_abb := n_-_pn_le & [ ORTH < "Cal.", "App." >, SYNSEM [ LKEYS.KEYREL.CARG "California Court of Appeal", PHON.ONSET con ] ].</li></ul> That is, we obtain spanning parses of the phrase (despite PET’s warning), although not the desired one as follows. We see alphanumeric_identifier_ne_4_tmr (harmlessly for the purposes of this discussion) derive a proper_ne for ‘2d’ but the first lexical entry above is not triggered. I have no idea why the 3-part lexical entry is not matched. Any ideas? Thank you,Paul </div></div></blockquote></div></div>