[developers] TDL identifiers
Francis Bond
bond at ieee.org
Tue Oct 23 12:29:19 CEST 2018
G'day,
currently Zhong has several identifiers which Mike's TDL code
considers invalid, but which the LKB and ACE are fine with:
*-marker := symbol &
,_c_1 := conj_-_e_le &
_n_1 := n_-_pn_le &
和_c_⚠ := conj_-_e_le &
格里姆斯比•罗伊洛特_n_1 := n_-_h_pn_le &
full width *
full width ,
nonbreakspace [our bad, I will remove]
warning sign (which I like to use for mal-rules).
dot (often used in foreign names)
And in Jacy:
ザ・ベスト_n_1-tc := ordinary-nohon-n-lex &
full width dot (often used in foreign names)
PyDelphin defines identifiers to be: ([\w_+*?-]+),
and coreference to be \#([^\s!"#$&'(),./:;<=>[\]^]+)
It would be nice to at least include: ・•⚠, in identifiers, but maybe
better to have a list of disallowed things (like coreference, now I
guess with |):
([^\s!"#$&'(),./:;<=>[\]^|]+)
and even better if the LKB, PET, ACE, AGREE and PyDelphin are consistent.
What do people think?
--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
More information about the developers
mailing list