[developers] TDL identifiers

Francis Bond bond at ieee.org
Tue Oct 23 12:29:19 CEST 2018


G'day,

currently Zhong has several identifiers which Mike's TDL code
considers invalid, but which the LKB and ACE are fine with:

*-marker := symbol &
 ,_c_1 := conj_-_e_le &
 _n_1 := n_-_pn_le &
和_c_⚠ := conj_-_e_le &
格里姆斯比•罗伊洛特_n_1 := n_-_h_pn_le &

full width *
full width ,
nonbreakspace  [our bad, I will remove]
warning sign (which I like to use for mal-rules).
dot (often used in foreign names)

And in Jacy:
ザ・ベスト_n_1-tc := ordinary-nohon-n-lex &
full width dot (often used in foreign names)

PyDelphin defines identifiers to be: ([\w_+*?-]+),
and coreference to be  \#([^\s!"#$&'(),./:;<=>[\]^]+)

It would be nice to at least include: ・•⚠, in identifiers, but maybe
better to have a list of disallowed things (like coreference, now I
guess with |):

([^\s!"#$&'(),./:;<=>[\]^|]+)

and even better if the LKB, PET, ACE, AGREE and PyDelphin are consistent.

What do people think?

-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University



More information about the developers mailing list