[developers] ICONS and generation
Emily M. Bender
ebender at uw.edu
Thu Feb 18 01:26:38 CET 2016
Just a quick and belatedly reply to say that from where I sit your analysis
of the situation makes a lot of sense.
Emily
On Sun, Feb 7, 2016 at 2:12 AM, Ann Copestake <aac10 at cam.ac.uk> wrote:
> Thanks! and thanks all! I've come to a view on this which I think is
> consistent with what everyone has been saying.
>
> First of all, note that in the MRS syntax, we do not distinguish between
> terminated and non-terminated lists/bags. If we think about it from the
> perspective of typed feature structures, it is clear that there is a
> distinction - for instance a type `list' is the most general type of list,
> and the type `e-list' (empty list) is usually a maximally specific type.
> Coming back to the notation I used in an earlier message, there is a
> distinction between { ... } (analogous to list in a TFS) and {} (cf
> e-list).
>
> Now, there are two possible interpretations of ICONS as it arises from a
> DELPH-IN grammar (i.e., as it is output after parsing):
> 1. information structure
> 2. information structure as it arises from morphosyntax
>
> In the `normal' sentences of the `Kim chased the dog' type, no information
> structure elements arise from morphosyntax. We can, however, expect that
> various contexts (e.g., discourse) give rise to information structure in
> association with such a sentence. Hence, with respect to interpretation 1,
> ICONS is not strictly empty but underspecified (and similarly a one element
> ICONS may be underspecified with respect to a two-element ICONS and so
> on). I think this is consistent with what Sanghoun and Emily are saying.
> Under this interpretation, an MRS with no specification of ICONS should
> indeed generate all the variants sentences we've been discussing. And so
> on.
>
> However, with respect to interpretation 2, the ICONS emerging from the
> parse of such a sentence is terminated. Once we've finished parsing, we're
> guaranteeing no more ICONS elements will arise from morphosyntax, whatever
> someone does in discourse. Under this interpretation, if I say I want to
> generate a sentence with an empty ICONS, I mean I want to generate a
> sentence with no ICONS contribution from morphosyntax. This is also a
> legitimate use of the realiser, considered as a stand-alone module.
>
> Since ICONS is something which I have always thought of as on the boundary
> of morphosyntax and discourse, I want to be able to enrich ICONS emerging
> from parsing with discourse processing, so interpretation 1 makes complete
> sense. However, I believe it is also perfectly legitimate to be able to
> divide the world into what the grammar can be expected to do and what it
> can't, and that is consistent with interpretation 2.
>
> As a hypothetical move, consider an additional classification of ICONS
> elements according to whether or not they arise from morphosyntax. Then we
> can see that a single ICONS value could encompass both interpretations.
> i.e., what would arise from a parse would be a terminated list of
> morphosyntactic-ICONS elements but the ICONS as a whole could be
> non-terminated.
>
> I think there may be reasons to be able to distinguish ICONS elements
> according to whether they are intended as grammar-derived or not, though I
> do see this might look messy. But anyway, I want to first check that
> everyone agrees with this analysis of the situation before trying to work
> out what we might do about it in terms of notation.
>
> Incidentally - re Dan's message - my overly brief comment about Sanghoun's
> use of DMRS earlier was intended to point out that if DMRS had the
> necessary links for demonstrating ICONS, then in principle this was
> something we know how to extract. But right now, I'm not clear whether or
> not we do need all the underspecified elements, and that's something I
> would like Dan to comment on before we go further.
>
> All best,
>
> Ann
>
>
>
> On 06/02/2016 17:59, Sanghoun Song wrote:
>
> My apologies for my really late reply!
>
> I am not sure whether I fully understand your discussion, but I would like
> to leave several my ideas on using ICONS for generation.
>
> First, in my analysis (final version), only expressions that contribute to
> information structure introduce an ICONS element into the list. For
> example, the following unmarked sentence (a below) has no ICONS element
> (i.e. empty ICONS).
>
> a. plain: Kim chases the dog.
> b. passivization: The dog is chased by Kim.
> c. fronting: The dog Kim chases.
> d. clefting: It is the dog that Kim chases.
>
> Using the type hierarchy for information structure in my thesis, I can
> say the followings
>
> (i) The subject Kim and the object the dog in a plain active sentence (a)
> are in situ. They may or may not be focused depending on which constituent
> bears a specific accent, but in the sentence-based processing their
> information structure values had better remain underspecified for flexible
> representation.
>
> (ii) The promoted argument the dog in the passive sentence (b) is
> evaluated as conveying focus-or-topic, while the demoted argument Kim is
> associated with non-topic.
>
> (iii) In (c), the fronted object the dog is assumed to be assigned
> focus-or-topic in that the sentence conveys a meaning of either "As for the
> dog, Kim chases it". or (d), while the subject in situ is evaluated as
> containing neither topic nor focus (i.e. background). (Background may not
> be implemented in the ERG, I think.)
>
> (iv) The focused NP in (d) carries focus, and the subject in the cleft
> clause Kim is also associated with bg.
>
> Thus, we can create a focus specification hierarchy amongst (a-d) as
> [clefting > fronting > passivization > plain].
>
> What I want to say is that a set of sentences which share some properties
> may have subtle shades of meaning depending on how focus is assigned to the
> sentences. Paraphrasing is made only in the direction from the right to the
> left of [clefting > fronting > passivization > plain], because paraphrasing
> in the opposite direction necessarily causes loss of information. For
> example, a plain sentence such as (a) can be paraphrased into a cleft
> construction such as (d), but not vice versa.
>
> In a nutshell, a more specific sentence might not better to be paraphrased
> into a less specific sentence in terms of information structure.
>
> Second, I provided many dependency graphs in my thesis. The main reason
> was that nobody outside of the DELPH-IN can fully understands the complex
> co-indexation in ICONS/MRS. At that time, I didn't work on DMRS with
> respect to ICONS. If there is a way to represent ICONS in DMRS (direct from
> TFS or via MRS), I am interested in the formalism.
>
>
> Sanghoun
>
>
> On Sat, Feb 6, 2016 at 1:26 AM, Ann Copestake <aac10 at cam.ac.uk> wrote:
>
>> Briefly (more this evening maybe) - I don't see a particular problem with
>> filling in the ICONS since what you describe are relationships that are
>> overt in the *MRS anyway, aren't they? I thought, in fact, that these are
>> pretty clear from the DMRS graph - which is why Sanghoun uses it to
>> describe what's going on.
>>
>> I believe we can build the DMRS graph direct from the TFS, incidentally -
>> don't need to go via MRS ...
>>
>> Cheers,
>>
>> Ann
>>
>>
>> On 05/02/2016 23:40, Dan Flickinger wrote:
>>
>> As I understand the Soon and Bender account, an MRS for a sentence should
>> include in the ICONS list at least one element for each individual
>> (eventuality or instance) that is introduced. In the ERG this would mean
>> that the value of each ARG0 should appear in at least one ICONS entry,
>> where most of these would be of the maximally underspecified type
>> `info-str', but possibly specialized because of syntactic structure or
>> stress/accent or maybe even discourse structure.
>>
>>
>> I see the virtue of having these overt ICONS elements even when of type
>> `info-str', to enable the fine-grained control that Stephan notes that we
>> want for generation, and also to minimize the differences between the ERG
>> and grammars being built from the Matrix which embody Sanghoun's careful
>> work.
>>
>>
>> If the grammarian is to get away with not explicitly introducing each of
>> these ICONS elements in the lexical entries, as Sanghoun does in the
>> Matrix, then it would have to be possible to predict and perhaps
>> mechanically add the missing ones after composition was completed. I used
>> to hope that this would be possible, but now I'm doubtful, leading me to
>> think that there is no good alternative to the complication (maybe I should
>> more kindly use the term `enrichment') of the grammar with the overt
>> introduction of these guys everywhere. Here's my reasoning:
>>
>>
>> I assume that what we'll want in an MRS for an ordinary sentence is an
>> ICONS list that has exactly one entry for each pair of an individual `i'
>> and the eventuality which is the ARG0 of each predication in which `i'
>> appears as an argument. Thus for `the cat persuaded the dog to bark' the
>> ICONS list should have four elements: one for cat/persuade, one for
>> dog/persuade, one for bark/persuade, and one for dog/bark. Now if I wanted
>> to have the grammar continue to only insert ICONS elements during
>> composition for the non-vanilla info-str phenomena, and fill in the rest
>> afterward, I would have to know not only the arity of each
>> eventuality-predication, but which of its arguments was realized in the
>> sentence, and even worse, which of the realized syntactic arguments
>> corresponded to semantic arguments (so for example not the direct object of
>> `believe'). Maybe I give up too soon here, but this does not seem doable
>> just operating on the MRS resulting from composition, even with access to
>> the SEM-I.
>>
>>
>> So if the necessary ICONS elements have to be introduced overtly by the
>> lexicon/grammar during composition, then I would still like to explore a
>> middle ground that does not result in the full set of ICONS elements Soon
>> and Bender propose for a sentence. That is, I wondered whether we could
>> make do with adding to the ERG the necessary introduction of just those
>> ICONS elements that would enable us to draw the distinctions between
>> `unmarked', 'topic', and 'focus' that we were used to exploiting in the
>> days of messages. But since pretty much any preposition's or adjective's
>> or verb's complement can be extracted, and any verb's subject can be
>> extracted, and most verbs' direct and indirect objects can be passivized, I
>> think we'll still end up with an ICONS entry for each eventuality/argument
>> pair for every predication-introducing verb, adjective, and preposition in
>> a sentence, and maybe also for some nouns as in "who is that picture of?".
>> This still lets us exclude ICONS elements involving adverbs and maybe also
>> the arguments of conjunctions, subordinators, modals. If we went this
>> route, I think it would be possible to make modest additions to certain of
>> the constructions, and not have to meddle with lexical types, to get these
>> ICONS elements into the MRS during composition.
>>
>>
>> Such a partial approach does not have the purity of Soon and Bender's
>> account, but might be more practical, at least as a first step, for the
>> ERG. It would at least enable what I think is a more consistent
>> interpretation of the ICONS elements for generation, and should give us the
>> fine-grained control I agree that we want. Thus to get the generator to
>> produce all variants from an MRS produced by parsing a simple declarative,
>> one would have to remove the info-str ICONS element whose presence excludes
>> the specialization to focus or topic because of our friend Skolem.
>>
>>
>> Counsel?
>>
>>
>> Dan
>>
>> ------------------------------
>> *From:* developers-bounces at emmtee.net <developers-bounces at emmtee.net>
>> <developers-bounces at emmtee.net> on behalf of Ann Copestake
>> <aac10 at cam.ac.uk><aac10 at cam.ac.uk> <aac10 at cam.ac.uk>
>> *Sent:* Friday, February 5, 2016 1:43 PM
>> *To:* Emily M. Bender; Stephan Oepen
>> *Cc:* developers; Ann Copestake
>> *Subject:* Re: [developers] ICONS and generation
>>
>> Thanks!
>>
>> On 05/02/2016 21:30, Emily M. Bender wrote:
>>
>> Not sure if this answers the question, but a couple of comments:
>>
>> (a) I do think that written English is largely underspecified for
>> information structure.
>> It's part of what makes good writing good that the information structure
>> is made apparent
>> somehow.
>>
>>
>> OK. should I understand you as saying that composition (as in, what we
>> do in the grammars) leaves it mostly underspecified, but that discourse
>> level factors make it apparent? or that it really is underspecified?
>>
>> (b) I think the "I want only the unmarked form back" case might be
>> handled by either
>> a setting which says "no ICONS beyond what as in the input" (i.e. your
>> ICONS { }) or
>> a pre-processing/generation fix-up rule that takes ICONS { ... } and
>> outputs something
>> that would be incompatible with anything but the unmarked form. Or maybe
>> the
>> subsumption check goes the wrong way for this one?
>>
>> Yes, I think the ICONS {} might be a possible way of thinking about it.
>> I should make it clear - I don't think there's a problem with constructing
>> an implementation that produces the `right' behaviour but I would much
>> prefer that the behaviour is specifiable cleanly in the formalism rather
>> than as another parameter to the generator or whatever.
>>
>> I hope Sanghoun has something to add here!
>>
>> Emily
>>
>> On Fri, Feb 5, 2016 at 1:01 PM, Stephan Oepen < <oe at ifi.uio.no>
>> oe at ifi.uio.no> wrote:
>>
>>> colleagues,
>>>
>>> my ideal would be a set-up where the provider of generator inputs has
>>> three options: (a) request topicalization (or similar), (b) disallow it, or
>>> (c) underspecify and get both variants.
>>>
>>> we used to have that level of control (and flexibility) in the LOGON
>>> days where there were still messages: in the message EPs, there were two
>>> optional ‘pseudo’ roles (TPC and PSV) to control topicalization or
>>> passivization of a specific instance variable. effectively, when
>>> present, these established a binary relation between the clause and one of
>>> its nominal constituents. if i recall correctly, blocking topicalization
>>> was accomplished by putting an otherwise unbound ‘anti’-variable into the
>>> TPC or PSV roles.
>>>
>>> could one imagine something similar in the ICONS realm, and if so, which
>>> form would it have to take?
>>>
>>> best wishes, oe
>>>
>>>
>>> On Friday, February 5, 2016, Woodley Packard < <sweaglesw at sweaglesw.org>
>>> sweaglesw at sweaglesw.org> wrote:
>>>
>>>> I can confirm that under ACE, behavior is what you indicate, i.e.
>>>> generating from parsing the topicalized feline-canine-playtime I get just
>>>> the topicalized variant out, but when generating from parsing the ordinary
>>>> word order I get all 5 variants out.
>>>>
>>>> I believe this was designed to imitate the long-standing condition that
>>>> the MRS of generation results must be subsumed by the input MRS. The
>>>> observed behavior seems to me to be the correct interpretation of the
>>>> subsumption relation with ICONS involved. Note that an MRS with an extra
>>>> intersective modifier would also be subsumed, for example, but such MRS are
>>>> never actually generated since those modifier lexical entries never make it
>>>> into the chart.
>>>>
>>>> It’s certainly reasonable to ask whether (this notion of) subsumption
>>>> is really the right test. I’ve met lots of folks who prefer to turn that
>>>> subsumption test off entirely. I guess it’s also possible that the
>>>> subsumption test is right for the RELS portion of the MRS but not for the
>>>> ICONS, though that seems a bit odd to consider. However, given that we
>>>> don’t have many ideas about truth-conditional implications of ICONS, maybe
>>>> not so odd.
>>>>
>>>> I don’t really have much to offer in terms of opinions about what the
>>>> right behavior should be. I (believe I) just implemented what others asked
>>>> for a couple years ago :-)
>>>>
>>>> -Woodley
>>>>
>>>> > On Feb 5, 2016, at 8:03 AM, Ann Copestake <aac10 at cl.cam.ac.uk> wrote:
>>>> >
>>>> > I'm part way through getting ICONS support working in Lisp, testing
>>>> on the version of the ERG available as trunk. I have a question about
>>>> generation. If I implemented the behaviour described in
>>>> <http://moin.delph-in.net/IconsSpecs>
>>>> http://moin.delph-in.net/IconsSpecs there doesn't seem to be a way of
>>>> specifying that I want a `normal' ordering for English.
>>>> >
>>>> > e.g., if I take the MRS resulting from
>>>> >
>>>> > that dog, the cat chased.
>>>> >
>>>> > without ICONS check, there are 5 realizations, including the `null
>>>> ICONS' case `The cat chased that dog.' With an exact ICONS check, I can
>>>> select realizations with the same ICONS (modulo order of ICONS elements, of
>>>> course, in the case where there's more than one element). But with the
>>>> <http://moin.delph-in.net/IconsSpecs>
>>>> <http://moin.delph-in.net/IconsSpecs>
>>>> http://moin.delph-in.net/IconsSpecs behaviour, there's no way of
>>>> specifying I want a `normal' order - if I don't give an ICONS, I will
>>>> always get the 5 realisations. In fact, as I understand it, I can always
>>>> end up with more icons in the realisation than in the input, as long as I
>>>> can match the ones in the input.
>>>> >
>>>> > So:
>>>> > - is the IConsSpec behaviour what is desired for the ERG (e.g.,
>>>> because one can rely on the realisation ranking to prefer the most `normal'
>>>> order)?
>>>> > - or does the ERG behave differently from Emily and Sanghoun's
>>>> grammars, such that different generator behaviour is desirable? and if so,
>>>> could we change things so we don't need different behaviours
>>>> >
>>>> > Ann
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>
>>
>> --
>> Emily M. Bender
>> Professor, Department of Linguistics
>> Check out CLMS on facebook! <http://www.facebook.com/uwclma>
>> <http://www.facebook.com/uwclma>http://www.facebook.com/uwclma
>>
>>
>>
>>
>
>
> --
> =================================
> Sanghoun Song
> Assistant Professor
> Dept. of English Language and Literature
> Incheon National University
> http://corpus.mireene.com
> phone: +82-32-835-8129 (office)
> =================================
>
>
>
--
Emily M. Bender
Professor, Department of Linguistics
Check out CLMS on facebook! http://www.facebook.com/uwclma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20160217/538fd11d/attachment-0001.html>
More information about the developers
mailing list