[developers] ICONS and generation
Ann Copestake
aac10 at cl.cam.ac.uk
Thu Feb 18 19:23:05 CET 2016
Thanks - would anyone else like to comment? I'm currently not sure
whether simply enriching ICONS (and possibly HCONS) with a ... notation
would be enough to give us what's needed in terms of behaviour but I
suspect that needs to be part of the solution.
Ann
On 18/02/16 00:26, Emily M. Bender wrote:
> Just a quick and belatedly reply to say that from where I sit your
> analysis
> of the situation makes a lot of sense.
>
> Emily
>
> On Sun, Feb 7, 2016 at 2:12 AM, Ann Copestake <aac10 at cam.ac.uk
> <mailto:aac10 at cam.ac.uk>> wrote:
>
> Thanks! and thanks all! I've come to a view on this which I think
> is consistent with what everyone has been saying.
>
> First of all, note that in the MRS syntax, we do not distinguish
> between terminated and non-terminated lists/bags. If we think
> about it from the perspective of typed feature structures, it is
> clear that there is a distinction - for instance a type `list' is
> the most general type of list, and the type `e-list' (empty list)
> is usually a maximally specific type. Coming back to the
> notation I used in an earlier message, there is a distinction
> between { ... } (analogous to list in a TFS) and {} (cf e-list).
>
> Now, there are two possible interpretations of ICONS as it arises
> from a DELPH-IN grammar (i.e., as it is output after parsing):
> 1. information structure
> 2. information structure as it arises from morphosyntax
>
> In the `normal' sentences of the `Kim chased the dog' type, no
> information structure elements arise from morphosyntax. We can,
> however, expect that various contexts (e.g., discourse) give rise
> to information structure in association with such a sentence.
> Hence, with respect to interpretation 1, ICONS is not strictly
> empty but underspecified (and similarly a one element ICONS may be
> underspecified with respect to a two-element ICONS and so on). I
> think this is consistent with what Sanghoun and Emily are saying.
> Under this interpretation, an MRS with no specification of ICONS
> should indeed generate all the variants sentences we've been
> discussing. And so on.
>
> However, with respect to interpretation 2, the ICONS emerging from
> the parse of such a sentence is terminated. Once we've finished
> parsing, we're guaranteeing no more ICONS elements will arise from
> morphosyntax, whatever someone does in discourse. Under this
> interpretation, if I say I want to generate a sentence with an
> empty ICONS, I mean I want to generate a sentence with no ICONS
> contribution from morphosyntax. This is also a legitimate use of
> the realiser, considered as a stand-alone module.
>
> Since ICONS is something which I have always thought of as on the
> boundary of morphosyntax and discourse, I want to be able to
> enrich ICONS emerging from parsing with discourse processing, so
> interpretation 1 makes complete sense. However, I believe it is
> also perfectly legitimate to be able to divide the world into what
> the grammar can be expected to do and what it can't, and that is
> consistent with interpretation 2.
>
> As a hypothetical move, consider an additional classification of
> ICONS elements according to whether or not they arise from
> morphosyntax. Then we can see that a single ICONS value could
> encompass both interpretations. i.e., what would arise from a
> parse would be a terminated list of morphosyntactic-ICONS elements
> but the ICONS as a whole could be non-terminated.
>
> I think there may be reasons to be able to distinguish ICONS
> elements according to whether they are intended as grammar-derived
> or not, though I do see this might look messy. But anyway, I want
> to first check that everyone agrees with this analysis of the
> situation before trying to work out what we might do about it in
> terms of notation.
>
> Incidentally - re Dan's message - my overly brief comment about
> Sanghoun's use of DMRS earlier was intended to point out that if
> DMRS had the necessary links for demonstrating ICONS, then in
> principle this was something we know how to extract. But right
> now, I'm not clear whether or not we do need all the
> underspecified elements, and that's something I would like Dan to
> comment on before we go further.
>
> All best,
>
> Ann
>
>
>
> On 06/02/2016 17:59, Sanghoun Song wrote:
>> My apologies for my really late reply!
>>
>> I am not sure whether I fully understand your discussion, but I
>> would like to leave several my ideas on using ICONS for generation.
>>
>> First, in my analysis (final version), only expressions that
>> contribute to information structure introduce an ICONS element
>> into the list. For example, the following unmarked sentence (a
>> below) has no ICONS element (i.e. empty ICONS).
>>
>> a. plain: Kim chases the dog.
>> b. passivization: The dog is chased by Kim.
>> c. fronting: The dog Kim chases.
>> d. clefting: It is the dog that Kim chases.
>>
>> Using the type hierarchy for information structure in my thesis,
>> I can say the followings
>>
>> (i) The subject Kim and the object the dog in a plain active
>> sentence (a) are in situ. They may or may not be focused
>> depending on which constituent bears a specific accent, but in
>> the sentence-based processing their information structure values
>> had better remain underspecified for flexible representation.
>>
>> (ii) The promoted argument the dog in the passive sentence (b) is
>> evaluated as conveying focus-or-topic, while the demoted argument
>> Kim is associated with non-topic.
>>
>> (iii) In (c), the fronted object the dog is assumed to be
>> assigned focus-or-topic in that the sentence conveys a meaning of
>> either "As for the dog, Kim chases it". or (d), while the subject
>> in situ is evaluated as containing neither topic nor focus (i.e.
>> background). (Background may not be implemented in the ERG, I think.)
>>
>> (iv) The focused NP in (d) carries focus, and the subject in the
>> cleft clause Kim is also associated with bg.
>>
>> Thus, we can create a focus specification hierarchy amongst (a-d)
>> as [clefting > fronting > passivization > plain].
>>
>> What I want to say is that a set of sentences which share some
>> properties may have subtle shades of meaning depending on how
>> focus is assigned to the sentences. Paraphrasing is made only in
>> the direction from the right to the left of [clefting > fronting
>> > passivization > plain], because paraphrasing in the opposite
>> direction necessarily causes loss of information. For example, a
>> plain sentence such as (a) can be paraphrased into a cleft
>> construction such as (d), but not vice versa.
>>
>> In a nutshell, a more specific sentence might not better to be
>> paraphrased into a less specific sentence in terms of information
>> structure.
>>
>> Second, I provided many dependency graphs in my thesis. The main
>> reason was that nobody outside of the DELPH-IN can fully
>> understands the complex co-indexation in ICONS/MRS. At that time,
>> I didn't work on DMRS with respect to ICONS. If there is a way to
>> represent ICONS in DMRS (direct from TFS or via MRS), I am
>> interested in the formalism.
>>
>>
>> Sanghoun
>>
>>
>> On Sat, Feb 6, 2016 at 1:26 AM, Ann Copestake <aac10 at cam.ac.uk
>> <mailto:aac10 at cam.ac.uk>> wrote:
>>
>> Briefly (more this evening maybe) - I don't see a particular
>> problem with filling in the ICONS since what you describe are
>> relationships that are overt in the *MRS anyway, aren't
>> they? I thought, in fact, that these are pretty clear from
>> the DMRS graph - which is why Sanghoun uses it to describe
>> what's going on.
>>
>> I believe we can build the DMRS graph direct from the TFS,
>> incidentally - don't need to go via MRS ...
>>
>> Cheers,
>>
>> Ann
>>
>>
>> On 05/02/2016 23:40, Dan Flickinger wrote:
>>>
>>> As I understand the Soon and Bender account, an MRS for a
>>> sentence should include in the ICONS list at least one
>>> element for each individual (eventuality or instance) that
>>> is introduced. In the ERG this would mean that the value of
>>> each ARG0 should appear in at least one ICONS entry, where
>>> most of these would be of the maximally underspecified type
>>> `info-str', but possibly specialized because of syntactic
>>> structure or stress/accent or maybe even discourse structure.
>>>
>>>
>>> I see the virtue of having these overt ICONS elements even
>>> when of type `info-str', to enable the fine-grained control
>>> that Stephan notes that we want for generation, and also to
>>> minimize the differences between the ERG and grammars being
>>> built from the Matrix which embody Sanghoun's careful work.
>>>
>>>
>>> If the grammarian is to get away with not explicitly
>>> introducing each of these ICONS elements in the lexical
>>> entries, as Sanghoun does in the Matrix, then it would have
>>> to be possible to predict and perhaps mechanically add the
>>> missing ones after composition was completed. I used to
>>> hope that this would be possible, but now I'm doubtful,
>>> leading me to think that there is no good alternative to the
>>> complication (maybe I should more kindly use the term
>>> `enrichment') of the grammar with the overt introduction of
>>> these guys everywhere. Here's my reasoning:
>>>
>>>
>>> I assume that what we'll want in an MRS for an ordinary
>>> sentence is an ICONS list that has exactly one entry for
>>> each pair of an individual `i' and the eventuality which is
>>> the ARG0 of each predication in which `i' appears as an
>>> argument. Thus for `the cat persuaded the dog to bark' the
>>> ICONS list should have four elements: one for cat/persuade,
>>> one for dog/persuade, one for bark/persuade, and one for
>>> dog/bark. Now if I wanted to have the grammar continue to
>>> only insert ICONS elements during composition for the
>>> non-vanilla info-str phenomena, and fill in the rest
>>> afterward, I would have to know not only the arity of each
>>> eventuality-predication, but which of its arguments was
>>> realized in the sentence, and even worse, which of the
>>> realized syntactic arguments corresponded to semantic
>>> arguments (so for example not the direct object of
>>> `believe'). Maybe I give up too soon here, but this does not
>>> seem doable just operating on the MRS resulting from
>>> composition, even with access to the SEM-I.
>>>
>>>
>>> So if the necessary ICONS elements have to be introduced
>>> overtly by the lexicon/grammar during composition, then I
>>> would still like to explore a middle ground that does not
>>> result in the full set of ICONS elements Soon and Bender
>>> propose for a sentence. That is, I wondered whether we
>>> could make do with adding to the ERG the necessary
>>> introduction of just those ICONS elements that would enable
>>> us to draw the distinctions between `unmarked', 'topic', and
>>> 'focus' that we were used to exploiting in the days of
>>> messages. But since pretty much any preposition's or
>>> adjective's or verb's complement can be extracted, and any
>>> verb's subject can be extracted, and most verbs' direct and
>>> indirect objects can be passivized, I think we'll still end
>>> up with an ICONS entry for each eventuality/argument pair
>>> for every predication-introducing verb, adjective, and
>>> preposition in a sentence, and maybe also for some nouns as
>>> in "who is that picture of?". This still lets us exclude
>>> ICONS elements involving adverbs and maybe also the
>>> arguments of conjunctions, subordinators, modals. If we
>>> went this route, I think it would be possible to make modest
>>> additions to certain of the constructions, and not have to
>>> meddle with lexical types, to get these ICONS elements into
>>> the MRS during composition.
>>>
>>>
>>> Such a partial approach does not have the purity of Soon and
>>> Bender's account, but might be more practical, at least as a
>>> first step, for the ERG. It would at least enable what I
>>> think is a more consistent interpretation of the ICONS
>>> elements for generation, and should give us the fine-grained
>>> control I agree that we want. Thus to get the generator to
>>> produce all variants from an MRS produced by parsing a
>>> simple declarative, one would have to remove the info-str
>>> ICONS element whose presence excludes the specialization to
>>> focus or topic because of our friend Skolem.
>>>
>>>
>>> Counsel?
>>>
>>>
>>> Dan
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* developers-bounces at emmtee.net
>>> <mailto:developers-bounces at emmtee.net>
>>> <developers-bounces at emmtee.net>
>>> <mailto:developers-bounces at emmtee.net> on behalf of Ann
>>> Copestake <aac10 at cam.ac.uk> <mailto:aac10 at cam.ac.uk>
>>> *Sent:* Friday, February 5, 2016 1:43 PM
>>> *To:* Emily M. Bender; Stephan Oepen
>>> *Cc:* developers; Ann Copestake
>>> *Subject:* Re: [developers] ICONS and generation
>>> Thanks!
>>>
>>> On 05/02/2016 21:30, Emily M. Bender wrote:
>>>> Not sure if this answers the question, but a couple of
>>>> comments:
>>>>
>>>> (a) I do think that written English is largely
>>>> underspecified for information structure.
>>>> It's part of what makes good writing good that the
>>>> information structure is made apparent
>>>> somehow.
>>>>
>>>
>>> OK. should I understand you as saying that composition (as
>>> in, what we do in the grammars) leaves it mostly
>>> underspecified, but that discourse level factors make it
>>> apparent? or that it really is underspecified?
>>>
>>>> (b) I think the "I want only the unmarked form back" case
>>>> might be handled by either
>>>> a setting which says "no ICONS beyond what as in the input"
>>>> (i.e. your ICONS { }) or
>>>> a pre-processing/generation fix-up rule that takes ICONS {
>>>> ... } and outputs something
>>>> that would be incompatible with anything but the unmarked
>>>> form. Or maybe the
>>>> subsumption check goes the wrong way for this one?
>>>>
>>> Yes, I think the ICONS {} might be a possible way of
>>> thinking about it. I should make it clear - I don't think
>>> there's a problem with constructing an implementation that
>>> produces the `right' behaviour but I would much prefer that
>>> the behaviour is specifiable cleanly in the formalism rather
>>> than as another parameter to the generator or whatever.
>>>
>>>> I hope Sanghoun has something to add here!
>>>>
>>>> Emily
>>>>
>>>> On Fri, Feb 5, 2016 at 1:01 PM, Stephan Oepen
>>>> <oe at ifi.uio.no <mailto:oe at ifi.uio.no>> wrote:
>>>>
>>>> colleagues,
>>>>
>>>> my ideal would be a set-up where the provider of
>>>> generator inputs has three options: (a) request
>>>> topicalization (or similar), (b) disallow it, or (c)
>>>> underspecify and get both variants.
>>>>
>>>> we used to have that level of control (and flexibility)
>>>> in the LOGON days where there were still messages: in
>>>> the message EPs, there were two optional ‘pseudo’ roles
>>>> (TPC and PSV) to control topicalization or
>>>> passivization of a specific instance variable.
>>>> effectively, when present, these established a binary
>>>> relation between the clause and one of its
>>>> nominal constituents. if i recall correctly, blocking
>>>> topicalization was accomplished by putting an otherwise
>>>> unbound ‘anti’-variable into the TPC or PSV roles.
>>>>
>>>> could one imagine something similar in the ICONS realm,
>>>> and if so, which form would it have to take?
>>>>
>>>> best wishes, oe
>>>>
>>>>
>>>> On Friday, February 5, 2016, Woodley Packard
>>>> <sweaglesw at sweaglesw.org
>>>> <mailto:sweaglesw at sweaglesw.org>> wrote:
>>>>
>>>> I can confirm that under ACE, behavior is what you
>>>> indicate, i.e. generating from parsing the
>>>> topicalized feline-canine-playtime I get just the
>>>> topicalized variant out, but when generating from
>>>> parsing the ordinary word order I get all 5
>>>> variants out.
>>>>
>>>> I believe this was designed to imitate the
>>>> long-standing condition that the MRS of generation
>>>> results must be subsumed by the input MRS. The
>>>> observed behavior seems to me to be the correct
>>>> interpretation of the subsumption relation with
>>>> ICONS involved. Note that an MRS with an extra
>>>> intersective modifier would also be subsumed, for
>>>> example, but such MRS are never actually generated
>>>> since those modifier lexical entries never make it
>>>> into the chart.
>>>>
>>>> It’s certainly reasonable to ask whether (this
>>>> notion of) subsumption is really the right test.
>>>> I’ve met lots of folks who prefer to turn that
>>>> subsumption test off entirely. I guess it’s also
>>>> possible that the subsumption test is right for the
>>>> RELS portion of the MRS but not for the ICONS,
>>>> though that seems a bit odd to consider. However,
>>>> given that we don’t have many ideas about
>>>> truth-conditional implications of ICONS, maybe not
>>>> so odd.
>>>>
>>>> I don’t really have much to offer in terms of
>>>> opinions about what the right behavior should be.
>>>> I (believe I) just implemented what others asked
>>>> for a couple years ago :-)
>>>>
>>>> -Woodley
>>>>
>>>> > On Feb 5, 2016, at 8:03 AM, Ann Copestake
>>>> <aac10 at cl.cam.ac.uk <mailto:aac10 at cl.cam.ac.uk>> wrote:
>>>> >
>>>> > I'm part way through getting ICONS support
>>>> working in Lisp, testing on the version of the ERG
>>>> available as trunk. I have a question about
>>>> generation. If I implemented the behaviour
>>>> described in http://moin.delph-in.net/IconsSpecs
>>>> there doesn't seem to be a way of specifying that I
>>>> want a `normal' ordering for English.
>>>> >
>>>> > e.g., if I take the MRS resulting from
>>>> >
>>>> > that dog, the cat chased.
>>>> >
>>>> > without ICONS check, there are 5 realizations,
>>>> including the `null ICONS' case `The cat chased
>>>> that dog.' With an exact ICONS check, I can select
>>>> realizations with the same ICONS (modulo order of
>>>> ICONS elements, of course, in the case where
>>>> there's more than one element). But with the
>>>> <http://moin.delph-in.net/IconsSpecs>http://moin.delph-in.net/IconsSpecs
>>>> behaviour, there's no way of specifying I want a
>>>> `normal' order - if I don't give an ICONS, I will
>>>> always get the 5 realisations. In fact, as I
>>>> understand it, I can always end up with more icons
>>>> in the realisation than in the input, as long as I
>>>> can match the ones in the input.
>>>> >
>>>> > So:
>>>> > - is the IConsSpec behaviour what is desired for
>>>> the ERG (e.g., because one can rely on the
>>>> realisation ranking to prefer the most `normal' order)?
>>>> > - or does the ERG behave differently from Emily
>>>> and Sanghoun's grammars, such that different
>>>> generator behaviour is desirable? and if so, could
>>>> we change things so we don't need different behaviours
>>>> >
>>>> > Ann
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Emily M. Bender
>>>> Professor, Department of Linguistics
>>>> Check out CLMS on facebook!
>>>> <http://www.facebook.com/uwclma>http://www.facebook.com/uwclma
>>>
>>
>>
>>
>>
>> --
>> =================================
>> Sanghoun Song
>> Assistant Professor
>> Dept. of English Language and Literature
>> Incheon National University
>> http://corpus.mireene.com
>> phone: +82-32-835-8129 (office)
>> =================================
>
>
>
>
> --
> Emily M. Bender
> Professor, Department of Linguistics
> Check out CLMS on facebook! http://www.facebook.com/uwclma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20160218/d2f65db1/attachment-0001.html>
More information about the developers
mailing list