• Nem Talált Eredményt

Linking

In document Vector Semantics (Pldal 73-81)

58 2 From morphology to syntax

2.4 Linking 59

the underlying conceptual schema “there are two things, one ą the other” is fairly transparent, and analyzing the morpheme in terms of the schema makes sense, even if the schema itself must be treated as a primitive. The same can be said forfor/824

‘price’, which is easily identified through theexchange_schema of1.2. Here we

ac-tually succeed in doing away with primitivity, in that exchange lends itself to anal- exchange ysis before(=pat at person), after(=pat at other(person)) and

the ‘commercial exchange’ frame involves the conjunction of two such exchanges, that of the object and that of the money (see3.3for further discussion).

In general, the fact that we must resort to abstract conceptualizations requiring the use of Schankian scripts orFillmorean frames is a strong indication that the word or morpheme in question is a (near) primitive. Consider another sense offor,for_/2782

‘dative of purpose’ as inborn for achievement. We must recognize this in the grammat-ical sense, because the conceptual schema is somehow too diffuse to articulate even by a script or frame. Within some extremely broad pragmatic limits, anything can be for anything. JPEG uses the discrete cosine transform for compression. The men of Chou used the chestnut(li)for making the common people tremble(li). As long as we cannot define the dative of purpose by a conceptual schema we must list it as a primitive.

Here we illustrate near-grammaticality on the prepositions at, (other prepositions with a clear spatial meaning are deferred to3.1, where theplace_schema is discussed in detail) and for_/2782. In LDOCE, at is defined as ‘used to say exactly where something or someone is, or where something happens’, in CED as ‘used to show the position of a person or thing in space or time’. Since our analysis ofwherewould be ‘at wh’ this would result in an analysis that hasaton the rhs as well, making it a primitive.

But even if it is a primitive, we can’t afford to stay silent on what it means – to the extent we do we are left with trivial equations of thex“xsort.

Clearly, at is a binary relation (we write these in SVO order) =agt at =pat, where=patis strongly subtyped for location, be it spatial or temporal, so strongly that otherwise unspecified entities likeJim’shave to be typcast to location if we are to make sense of expressions like We meet at Jim’s. This selectional restriction on the second argument will be expressed by the clause=pat[place]. Using the same mechanism as in Eq.2.6we coerce the prepositional object to be a place by

PRpt`1q “PRptq `s|=paty xplace| (2.10) In contrast, the subject=agtis left untyped: it could be a physical object, a person, or even an event, what LDOCE describes as ‘something happens’. Whatatmeans is that it is happening at the origin of the abstractplace_where=patis located:

PRpt`1q “PRptq `s|=agty xorigin| (2.11) In other words, while the ground (object) gets construed as a place, the figure (subject) gets coerced into the origin of the place coordinate system. Sinceatdictates to perform both of these operations, we have

60 2 From morphology to syntax

PRpt`1q “PRptq `sp|=agty xorigin| ` |=paty xplace|q (2.12) and as usual, we leave it to the unification mechanism to guarantee that it is the origin of the ground (prepositional object) that the subject is coerced to, not the origin of some other coordinate system. Notice that the method of Eq.2.12, adding two rank 1 matrices to produce a rank 2 matrix, could in principle be extended to the modeling of ditransitives by rank 3 matrices and so forth. But this would require the addition of further theta roles (variable-binding term operators) besides=agtand=pat, a step we will not take here.

What happens during the analysis of Bill at office? We must select the eigenspace for Bill. (This is not trivial, there may be several Bills around, and we need to do considerable work to choose the right one, see3.3.) We also must select the right eigenspace foroffice, and most important, we must typecast Bill as figure and the office as ground, for this is whatat means. For the ground, we have a complete abstract coordinate system, and as we shall see in3.1, offices (and buildings in general) are trivially mapped to this. To conceptualize Bill as being at the office requires no more than applying to him the predicateinsidethat comes with this coordinate system.

At first blush, this may look as if we are just postponing the problem by reducingat toinside. But as we shall see,insidecomes for free, as a prebuilt component of the coordinate system. The real work is in the typecasting, which creates a new instance of the standard coordinate system with the office at its origin, and maps many of the fea-tures of this system appropriately, in the kind of process described by Fauconnier,1985.

We call this processcoercion, not because it is that different from what Fauconnier calls

‘projection mapping’, but rather because we wish to emphasize its forcible, Procrustean aspect. By understanding, mental reality is created.atforces Bill to be inside the office premises. We may entertain different notions, perhaps he is out shopping, but to under-stand the sentence is tantamount to having a concept of him in the office. We will return to the geometric interpretation of the coercion mechanism in3.3.

Returning to the problem posed byat, we can reformulate this as computing a se-quence of three thought vectors, the first one describing the state of the linguistic con-cept space after having heard (and recognized)Bill. This is simplyΨp1q “vpBillq. The second one,Ψp2q is after having heard and processed(is) at, and the third one, Ψp3q characterizes the state of the mental space after having heard the entire expressionBill (is) at (the) office. We assume thatatmakes available the entire system of conceptual coordinates that we will describe in3.1. The function of atdescribed in Eq. 2.12 is twofold: it typecastsBillas ‘figure’ and it also typecastsofficeas ‘ground’.

Traditional constituency tests make it clear that we process the material in A(BC) rather than (AB)C order, and it is also evident from self-inspection thatBill (is) atis not a coherent thought, whereasat (the) officeis, suggesting thatΨp2qwill be hard to pin down beyond the obvious fact that it already containsvpBillq. The effect of combining atandoffice is to coerce thevpofficeq eigenspace (which would be an eigenvector if we assumed office to be unanalyzed) into a few dimensions of the ground construct, effectively equating the office location with the origin of the coordinate system, and its walls with the ‘body’ that we will describe in greater detail in3.1.

2.4 Linking 61

We make no effort to describe the momentarily disconnect between the sequence of thought vectorsΨp1q, Ψp2q, Ψp3qthat gets resolved only after subject and object are both substituted, though it would be fairly easy to bring the usual techniques ofdynamic semanticsto bear, and we leave it as an exercise to the reader to convince themselves that micro-parsing ofBill is at the office, with the addition ofisandthe, is still feasible in five steps. (Hint: assume Eq.2.6forisand assume thatthecontributes nothing. The only hard part is to make sure that the prepositional objectofficeis combined withat, and it is this entire PP that is the object ofis.) In3.1we will extend this treatment fromatto a whole slew of locative prepositions (or postpositions, or case endings, depending on the language).

Let us turn to an explanation for more abstract, non-spatial binaries such asfor_, has, ins_, lack, mark_and others using purposivefor_as our example.

company va1llalat negotiatio firma 2549 N organization, for_ business

cutlery evo3eszko2z ferramentum sztuc1ce 3354 N knife is_a, fork is_a, spoon is_a, for_ eat

hand ke1z manus re1ka 1264 N

organ, part_of arm, human has arm, for_ [move gen], wrist part_of, palm part_of, five(finger) part_of, thumb part_of

handle fogo1 manubrium ra1czka 834 N part_of object, for_ hold(object in hand)

knife ke1s culter no1z1 1256 N

instrument, for_ cut, has blade<metal>, has handle lens lencse lenticula soczewica 3344 N

shape, part_of camera, light/739 through, for_ clear(image), <glass>[curve],

image has different(size), <look ins_>

money pe1nz pecunia pienia1dze 1952 N

artefact, for_ exchange, has value, official norm szaba1ly regula norma 3361 N

good for_ society

useful hasznos utilis przydatny 3134 A for_ gen

Just as we require a whole naive theory of space to make sense of locatives, we must invoke a whole naive theory of purpose to make sense of purpose clauses. The cardinal element of this is the premiss that artifacts are created for their utility. The naive defense oftheismoften relies on some form of this theory: since artifacts are created for a reason, there must be a creator. This is not to be confused with the Aristotelian notion that everything happens for a reason, which we interpret as a pure epistemological stance urging to find the causes, a matter we return to undercause_shortly. For now we state the premiss as our “Rule of facilitation”:

62 2 From morphology to syntax

gen use =agt, after(=pat[easy]) (2.13) This is another schema, one that we may consider the definition offor_ (if we wish to go beyond the idea that the dative of purpose is an unanalyzed primitive) or even the instrumental ins_ , which we define as =pat make =agt[easy]. The op-ins_

eration of the facilitation schema can be illustrated onJohn used a spoon for cutting the pizza. Any parser will return something likeJohn use spoon, spoon for_

{cut pizza}so2.13is invoked, we conclude that cutting the pizza was easier than to have done this by his bare hands. Clearly, this is not as good as cutting it with a piz-zacutter or a knife, for which it is true that they make the pizzacutting task easy, and we use the dative of purpose precisely because we want to avoid the implication that spoons are tools in general use for cutting pizza. It is a means, but not the most effective means.

InS19:3.6we discussed rules as being ‘entirely outside the sphere of human (individual or social) ability to change, exceptionless, and strict’. This is not to say that naive rules like2.13are the final say in our understanding of the world. As we shall see in Chap-ter5, we can, and do, have a better theory of probability than the naive theory thanks to Pascal, Laplace, and Kolmogorov; we have a better theory of space and time than the one articulated in Chapter3thanks to Euclid, Descartes, and Einstein; and so on. These theories are never hard-wired: they typically build on centuries of work by giants of in-tellect, they require considerable formal schooling to understand, and they rely on fields of knowledge, mathematics in particular, that have no support in natural language se-mantics. But the naive rules of how we perceive probabilities, space, time, cause, effect, and the like are built in, and it takes as much effort to unlearn them as to teach oneself to fly by means of controlling an airplane. Naive rules are exceptionless in the same way:

once the conditions are met, we have no means to suspend their application. Once you learn that fish are animals that live in water, whales are fish, and it takes special effort to unlearn this implication.

What do we mean when we say that companies are organizations for_ (doing) busi-ness, or that cutlery is for_ eating? We mean that use of these devices makes the activity easier. This extends to ‘activities’ like society, which could easily be construed as nom-inals: norms make it easier to have, to govern, or just to live in, some kind of society;

companies make it easier to do business, etc. What is common to all these definitions is that the object of for_ refers to the matter made easier, and the subject of for_ refers to the matter acting instrumentally. The resultingafterstate will be discussed in3.2, but we note here that we treat this as a substantive part of the knowledge representation, one that may require different time-indexed copies of vectors already present.

Closely related tofor_isins_, which we use inx has instrument yrather than x is instrument of y order in all definitions (about 0.65% of the total) where it appears. For the most part,ins_is the inverse offor_:

2.4 Linking 63

bite harap mordeo gryz1c1 1001 V cut, ins_ <tooth>

tooth fog dens za1b 827 N

organ, animal has, hard, in jaw, bite/1001 ins_, chew ins_, attack ins_, defend ins_

where we could have just as well said

bite harap mordeo gryz1c1 1001 V cut, <tooth> for_

tooth fog dens za1b 827 N

organ, animal has, hard, in jaw, for_ bite/1001, for_ chew, for_ attack, for_ defend

Having clarified that primitive status is not an external given but rather a lack of ability to find a suitable definition, and that grammaticalization is neither necessary nor sufficient for primitivity, we can now turn to the most recalcitrant of our primitives, the linkers.

Whether we keep ins_ (karan. a) as a primitive or accept the analysis given above, 4langcovers the system of P¯an.inian k¯arakas reasonably well. Verbal 1-links point to subjects, which are for the most part P¯an.inian agents(kartr.), and we will even capture some of the definition as ‘the independent one’ (1.4.54). Note, however, that we also speak of subjects for prepositions, pure statives, and experiencer verbs, etc. that many grammatical theories prefer to handle by a variety of other means. for_is goal (kar-man), locatives(adhikaran. a)and ablative (source,ap¯ad¯ana) will be discussed further in 3.1, but their treatment is largely similar to that ofat.

There is one notable sense in which our treatment is clearly inferior to the Asht.¯adhy¯a-y¯ı, the preferential attachment of the k¯arakas. P¯an.ini (1.4.42) uses the superlative s¯adhakatamamto define the instrument not just as the means, but as themost effective means to the goal, and similarly¯ıpsitatamamas what isprimarilydesired by the agent (1.4.49). Needless to say, 4langwill have the means to express superlatives – these will be derived using the comparativeer_’>’ aser_ all. What it lacks is the kind of powerful metalanguage that the Asht.¯adhy¯ay¯ı deploys in full. Our theory of naive gram-mar (see2.5) simply doesn’t have the means for comparing alternative derivations, even though such a facility would also be useful in phonology for implementingOptimality Theory. Regretfully, we must leave this for future work.

The one k¯araka missing from our system is the recipient(sam. prad¯ana). As discussed above, there is considerable computational pressure to avoid 3-tensors and higher multi-linear elements, and we will model ditransitives by decomposition:

give ad do dac1 113 V

=agt cause_ {person has =pat}, dative_ mark_ person buy vesz emo kupowac1 2609 V

=agt receive =pat, =agt pay seller, "from _" mark_ seller sell elad vendo sprzedac1 595 V

=agt cause_ {buyer has =pat}, buyer cause_

{=agt has money_}, dative_ mark_ buyer

64 2 From morphology to syntax

Recall that in2.2we already derived the slot-fillersbuyerandsellerby the agentive suffix-er/3627. We treattoas a dative case marker, but we could just as well treat it as a genuine locative case: after all, the recipient will have the object in physical possession in the default case (see Hovav and Levin,2008for further discussion). For cause_we adapt apost hoc ergo propter hocanalysis: we define x cause_ yby cause_

x before y, after(y). This falls quite short of a proper analysis of single and multiple causes, and it encourages precisely the kind of errors that are rampant in the identification of cause-effect relations. But there is no reason to assume that sophisticated data analysis of the kind urged in (Pearl,2009) can be replicated in natural language semantics, especially as the kind of statistics and probability theory that undergird the modern scientific understanding are not supported by natural language (see Chapter5).

We compare the commonsensical definition of causation to the counterfactualsine qua nondefinition in Chapter6.

In the case ofpart_of the situation is different: there is no great conceptual gap be-part_of

tween the naive theory of containment and set theory. Axiomatic set theory can of course approach a lot of problems that do not even arise in naive mereology, but we see no rea-son not to apply set theory here as well. Since we already have a containment primitive in, all we need is that=agtand=patareconnected. In spite of its reducibility, we keeppart_ofin the definitions, where it is used predominantly with body partsnose part_of faceand parts of natural objectsfruit part_of plant. This leaves one more relational to consider,has, which we use primarily in the notional sense of possession, as we handleinalienable possessionbypart_ofalready.

The Appendix reveals some ways we could further reduce our already small list of primitives. By defininghasas =agt control =pat, =agt has =pat, we has

have identified only one defining aspect of ownership, control, but lefthasas primitive, since it occurs on the right hand side of the definition as well. Almost a third of our definitions containhas, but these could be often traded off forpart_ofas in knife instrument, for_ cut, has blade<metal>, has handle. As of Re-knife

lease 2.0 it is not yet clear how more abstract relationships, where control alone seems insufficient to explain what is going on, should be handled. Considerway u1t via droga 2484 u N artefact, gen move at, has directionorblack way

fekete niger czarny 761 e A colour, dark, night has colour, coal has colour. Perhaps we will want to say that colors are part of the object, or black

that the road controls its direction, but this is not evident, and for now has must be assigned a matrix to be computed on the entire set of definitions, see9.5.

So far we connected, to the extent feasible,4langto the Asht.¯adhy¯ay¯ı. Sadly, we don’t have a large body of machine-readable Sanskrit fully parsed for k¯arakas, and even if we did, the subtle interplay between tense, voice, and deep cases would fast overburden the skeletal grammatical mechanism provided here. We also explained how the mainstays of case/valency systems, such as datives, locatives, and instruments, can be reconstructed without assuming link types beyond ‘1’ and ‘2’, by taking these as relationals that type-cast their arguments, the expressions that appear at the two ends of the named links.

2.4 Linking 65

In terms of the amount of fully analyzed text available, Universal Dependencies (UD) is the single most influential cross-linguistic framework of grammatical description (Nivre, Abrams, Agi´c, et al.,2018). While many other schools offer a broader variety of analyses, these, with the possible exception of tagmemics (Pike,1982) and Relational Grammar (Perlmutter,1980), rarely extend to a broad selection of languages. Also, the dominant style of linguistic analysis is the in-depth study of a restricted range of syn-tactic phenomena, ideally across many typologically diverse languages, rather than the in-breadth analysis of an entire language, which again makes it hard to link contem-porary computational linguistics with linguistic theory. Here we assume the reader is familiar with UD, and compare4lang to UD, pointing at other frameworks only in a few places. Generally,4langis on the sparse or ‘lumping’ side of the comparison, not just in relation to UD, but also in relation to other well-developed theories like LFG, HPSG, orMP.

Since UD distinguishes dependency links by the category of the head and the depen-dent, it naturally keeps notions likensubjandcsubj(nominal and clausal subjects) separate, and similarly for obj and ccomp. 4lang, with its roots in the theory of Knowledge Representation, where the proliferation of link types has emerged as a sig-nificant problem early on (Woods, 1975), admits only one other link type, ‘0’ (is_a), which subsumes most of the other link types used in UD, such as amod, appos, nummodandadvmod. In a strictly link-based system such as UD it is a practical neces-sity to have a separate link type for coordination: in4langwe just use comma-separated concatenation.

Both UD, and other theories of valency (for a summary, see Somers,1987) offer a broad variety of links, and our method of treating these as having their own subject and object remains applicable. A more radical step, one that is commonly taken in the study of thematic relations, is to assume that link types are acting as variable-binding term operators (VBTOs) so that we would have not just=agtand=pat, but also=goal,

=source, =theme, =pos and perhaps several others. In Release 1.0 of 4lang Makrai,2014used several thematic role-like constructs, but this really stretched the on-tological commitment (Quine,1947) of the model beyond what is absolutely necessary, and by now only objects and subjects remain.

This is of course not to deny that there are such things as datives or locatives, only that we can handle the information content without recourse to additional VBTOs. In particular, we make do without the ‘3’ or indirect object linker heavily used in Relational Grammar, which would call for a reanalysis for the broad variety of cases where this would come handy. On the theoretical side, we accept the arguments of Dowty,1989 that=agtand=patare sufficient – as a practical matter, these appeared in 178 (resp.

174) of the 1200 definitions whose headwords were listed in S19:4.8, while all others together appeared only 111 times. Consider the classic ‘commercial exchange’ schema we used in1.4to illustrate our use of voronoids as hypergraphs with nodes labeled by word vectors. This involves at least four participants: the seller, the buyer, the goods, and the money. Before the exchange, which can be conceptualized both in thebuyand in the

66 2 From morphology to syntax

sellframe, the seller has the goods and the buyer has the money: afterwards the buyer has the goods and the seller has the money. This information can easily be captured using the formal language of1.3:before(seller has goods, buyer has money), after(buyer has goods, seller has money)and we will see in3.2how beforeandaftercan be treated geometrically.

As we already have agentive -er at our disposal, linking the verbs to this schema is effortless. Linking the nouns is more tricky, and it is not even obvious that goods

‘things that are produced in order to be sold’ (LDOCE);product(4lang); or perhaps a synthetic descriptionwhat seller sellsis the best way. In a spreading activation model, the LDOCE definition is reachable fromsellin a single step (assuming, as we are, that soldis recognized as a form ofsell), and similarly for4lang, where the definition of productis artefact, for_ sell. To synthesize a definition may also make sense, product

especially when the object of selling is not something that we would normally consider a product, as inMahema sold Sayuri’s virginity to the Baron for 15,000 yen.

Calling this nominalTHEMEoffers no such advantage in reaching it, in fact it would negate the advantage of calling it=pat, which obviously facilitates link tracing. Note that the generally agreed definitions of themes, ‘a participant which is characterized as changing its position or condition, or as being in a state or position’ or ‘an object in mo-tion or in a steady state as the speakers perceives the state, or it is the topic of discussion’

are so broad as to fit nearly all conceivable nominals including not just the money, but also the agent and action nominalizations. It is precisely because of the limited reacha-bility fromgoodsorproductthat we name this quadrant of the voronoidgoods_

orproduct_.

Today, the standard commercial exchange involves even more participants: the buyer has a credit card, or better yet, a cellphone that acts as one, the seller has a credit card ter-minal, the buyer and the seller both have bank accounts linked to these, and the exchange of money is effected by some protocol neither buyer nor seller are fully in control of. It would require an absurdly large array of thematic roles to reach all these participants from the actual keywords, yet the fact that they are available is evident from the fact that definite descriptions can be used without prior mention:I wanted to buy a new pair of shoes. The card was rejected(Kálmán,1990).

In document Vector Semantics (Pldal 73-81)