[developers] Generation with unknown words

Stephan Oepen oe at ifi.uio.no
Sat Feb 6 12:05:56 CET 2016


dear alex,

> [...] (parsing gives "_porcelain/NN_u_unknown_rel"). Is there an option for ACE so
> that these cases can be handled?

if in part for historic interest, have you tried these inputs on the
LKB generator?  from what i recall, generation from unknown words with
predicates like the above used to work.  looking at ‘lkb/globals.lsp’
(see *generic-lexical-entries*) and ‘lkb/mrsglobals.lsp’ in the ERG,
it appears that the necessary predicate normalization only is applied
as part of the paraphrasing transfer grammar:

;;;
;;; a function currently only used in the paraphraser transfer grammar, to do
;;; post-parsing normalization of predicates associated to unknown words.  see
;;; the discussion on the `developers' list in May 2009 for background.  this
;;; should in principle be incorporated into MRS read-out already, i.e. there
;;; should be a way of registering MRS post-processing hooks.   (2-jun-09; oe)
;;;
;;; normalize-mrs() is keyed off a table of <tag, rule, pattern> triples, each
;;; pairing a PTB PoS tag with an orthographemic rule of the grammar (to strip
;;; off inflectional suffixes, if any), and a format() template used to create
;;; the normalized PRED value.
;;;
(defparameter *mrs-normalization-heuristics*
  '(("JJ[RS]?" nil "_~a_a_unknown_rel")
    ("(?:FW|NN)" nil "_~a_n_unknown_rel")
    ("NNS" nil "_~a_n_unknown_rel")
    ("RB" nil "_~a_a_unknown_rel")
    ("VBP?" :v_3s-fin_olr "_~a_v_unknown_rel")
    ("VBD" :v_pst_olr "_~a_v_unknown_rel")
    ("VBG" :v_prp_olr "_~a_v_unknown_rel")
    ("VBN" :v_psp_olr "_~a_v_unknown_rel")
    ("VBZ" :v_3s-fin_olr "_~a_v_unknown_rel")))

just now (seven years later), i see no immediate reason why this
normalization could not be incorporated into MRS construction after
parsing.  for the LKB and PET (when using the Lisp-based MRS library),
this would be straightforward.  for ACE (and PET, when using native
MRS construction), it would presumably require adding code to
implement the above heuristics.

incidentally, in the above, the entry for NNS looks potentially
deficient to me.  i woud have expected it to specify the plural
inflection rule (‘n_pl_olr’) to strip of a plural suffix.  in a
nutshell, what the above mechanism is supposed to accomplish is
lemmatization, driving the grammar-internal orthographemic rules from
the PTB PoS tags.

best wishes, oe



More information about the developers mailing list