Theoretical Linguistics Programme, Budapest University (ELTE)
c o m p o s i t i o n a l i n t e r p r e t a t i o n
OF COMPUTER COMMAND LANGUAGES
Gábor Rádai and László Kálmán
Research Institute for Linguistics, Hungarian Academy of Sciences Working Papers in the Theory of Grammar, Vol. 2, No. 2
Received: June 1995
COMPOSITIONAL INTERPRETATION OF COMPUTER COMMAND LANGUAGES
Gábor Radau* and László Kálmán**
‘ Department of Symbolic Logic, Budapest University (ELTE)
*Research Institute for Linguistics, HAS, Room 119
* Theoretical Linguistics Programme, Budapest University (ELTE) E-mail: radaiOnytud.hu, kalmau8nytud.hu
Working Papers in the Theory of Grammar, Vol. 2, No. 2 Supported by the Hungarian National Research Fund (O TK A )
Theoretical Linguistics Programme, Budapest University (E L T E ) Research Institute for Linguistics, Hungarian Academy of Sciences
Budapest I., P.O. Box 19. H-1250 Hungary
Telephone: (36-1) 175 8285; Fax: (36-1) 212 2050
% 'L
ß Kyelvraäomänyi intézel Kandiára ■■■"
leltári száffli 5 _ é 6 0 0
0. In tr o d u c tio n
T h e aim of th is p a p e r is to exam ine th e tra d itio n a l concept o f com positionality.
W e will be d ealin g w ith a language, nam ely, th e language of c o m m a n d s used in th e Unix o p e ra tin g system , th e in te rp re ta tio n of w hich is in tu itiv ely fa r from com positional, a lth o u g h it fits th e tra d itio n a l definition o f co m positionality. We will outline th e reason of this discrepancy, th e n we will show how to m o d ify the language so t h a t it receives an intuitively com positional in te rp re ta tio n . W e show th a t this will g et us closer to a m ore reaso n ab le definition of th e p rin cip le of co m p o sitio n ality an d its significance for th e sem antics o f n a tu ra l lan g u ag es.
T h e p a p e r is organized as follows. In section 1 we present th e P rin cip le of C o m p o sitio n ality an d argue th a t it is to b e stre n g th en e d , because it is to o loose in its original fo rm u latio n . In p articu lar, we in tro d u ce th e P rin cip le o f Ind ep en dence, and p ro p o se to include it into th e P rin cip le of C o m p o sitio n ality . T h e rest o f th e p ap er discusses a language, nam ely, th e language o f com m ands u s e d in the U nix o p e ra tin g system , th e in te rp re ta tio n o f which is fa r from c o m p o sitio n al in th e intuitive sense of th e word. However, th e tra d itio n a l P rin cip le o f C om posi
tio n a lity does n o t preclude such an in te rp re ta tio n . F irst, in section 2, we explain th e concept o f shells (com m and in te rp re te rs), an d show how th e U n ix com m and language is non-com positional. T hen we p resen t an a lte rn a tiv e co m m a n d language w hich has a m o re n a tu ra l in te rp re ta tio n , b ased on o u r version of th e concept of com positionality. Section 3 inform ally p resen ts th e way in which su c h a ‘com
p o sitio n al U n ix shell’ should work. T h e n we develop a language to ta lk about th e sem antic d o m ains relevant to our in te rp re ta tio n , i.e., various co m p o n en ts of a sim plified co n cep t of machine states (section 4). T h e n we ex p lain th e concept o f denotational semantics (section 5), a n o n -p ro ced u ral view of th e in te rp re ta tio n o f co m p u ter p ro g ram s, which underlies th e p a rtic u la r s tru c tu re th a t we a ttrib u te to ou r sem an tic dom ains (section 6). T h e a ctu al sy n ta x an d sem a n tic s of the language in w hich we can talk ab o u t those o b jects is given in sectio n 7, and the description o f th e sem antics of command lines (com m ands followed b y p aram e
te rs) will be ex p lain ed in section 8. T h e way in w hich we pro d u ce th o s e m eanings fro m those o f th e com m and nam es an d th e p a ra m e te rs in a co m p o sitio n al way is explained in sectio n 9. Finally, we offer som e conclusions (section 1 0 ).
1. C o m p o sitio n a lity
Let us first define th e concept which will b e in th e cen tre of o u r a tte n tio n th ro u g h o u t this p a p e r. T h e in te rp re ta tio n of a language can b e said com positional if an d only if it obeys th e Principle of C om positionality, w hich ru n s as follows:
1 .1 . T h e P r i n c i p l e o f C o m p o s i t i o n a l i t y
T h e m ean in g of a complex expression is a fu n ctio n of th e m e an in g s of its c o n stitu e n ts an d th eir mode of com bination.
2 1. Comp ősi ti on ali ty T h is definition leaves it o p en w hether ‘th e m eanings of th e c o n stitu e n ts’ m ay d e p e n d on each o th e r or on th e function th a t we use to calculate th e m eaning of th e com plex expression. H ow ever, it seems th a t the Principle of C om positionality w ould be r a th e r vacuous if we w ere to allow for such dependencies. T h a t is, we u n d e rs ta n d t h a t th e intended c o n ten t of th e Principle of C om positionality im plies a Principle o f Independence:
1 .2 . T h e P r i n c i p l e o f I n d e p e n d e n c e
T he m ean in g s of th e co n stitu en ts o f a complex expression a re assigned in d ep en d en tly of each o th e r and th e function th a t yields th e m ean in g of th e com plex expression.
T h e reason w hy we propose to a d d this prin cip le is th a t, as we will see shortly, languages th a t o b ey the P rin c ip le of C om positionality m ay still be ra th e r ‘non- co m p o sitio n al’ if they fail to s a tisfy the P rin cip le of Independence. In such la n guages, the m e an in g of an expression m ay v ary depending on w hat it is a con
s titu e n t of. As a result, very sim ilar co n stru ctio n s (e.g., containing th e sam e expression in th e sam e sy n tac tic role) m ay b e in terp reted in heterogeneous (or even u n related ) ways. We s u b m it th a t this co ntradicts th e in tu itio n b e h in d th e co n cep t of com positionality.
N ote th a t t h e in te rp re ta tio n o f com positionality proposed here im plies th a t th e m eaning co n trib u tio n s of th e constituents of an expression are co n sta n t, i.e., th e y do not vary fro m one c o n stru c tio n to th e o th e r. This m eans a certain context- independence as well, which m a n y would deny. We conceive o f this as a p rice to pay fo r a reaso n ab le concept o f com positionality. In our approach, th e co n tex t of u tte ra n c e (and th e u tte ra n c e -in te rn a l context o f any sub-expression) can only play a ro le inasm uch as b o th th e m ean in g s and th e functions th a t com bine th e m are underspecißed. T h a t is, by v ir tu e of their underspecification, co n tex tu al factors (including th e in te rn a l context, i.e ., the presence of the o th ers) m ay enrich these m eanings. This k in d of m ech an ism does not co n trad ict th e P rin cip le o f In d ep en dence, because it is not the m eanings assigned th a t depend on each o th e r, b u t w h a t th ey becom e la te r on.
I t is easy to see th a t the P rin c ip le of Independence is n o t vacuous a t all. T h e in te ra c tio n of m ean in g s is by d efin itio n contentful, i.e., th e P rin cip le of In d e p e n dence prevents m ean in g assig n m en ts from d ep en d in g on form al p ro p erties o f the c o n tex t (e.g., th e sh ap e of a co-occurring co n stitu en t). O nly genuine hom onym s (hom ophonous expressions w ith independent m eanings) challenge th is principle;
th o se have to b e considered different expressions which accidentally a re o f th e sam e shape. So w h e th e r an am b ig u ity is due to an accidental surface coincidence or a sy stem atic sem an tic p h en o m en o n must be determ ined independently.
2. Unix shells 3
2. U n ix sh ells
A shell is a p ro g ram th a t establishes co n tact betw een th e o p e ra tin g s y ste m of a co m p u ter a n d its user. Its task is to forw ard th e u se r’s com m ands to th e o p e ra tin g system (a fte r a check of correctness). (A co m m an d is also called a co m m a n d line; we will refer to it as a cml.) M any shells offer a d d itio n al fe atu res to th e u s e r (such as a b b re v iato ry m echanism s and ways o f referring to com m ands issu ed earlier), as well as b u ilt-in com m ands. The shells used w ith th e U nix o p e ra tin g system (especially th e C-shell) offer many such featu res. T h e com m ands t h a t do not exploit th e e x tra possibilities offered by th e shell m ay co n tain a co m m a n d nam e ( cmn) a n d various types of parameters th a t follow it. T h e com m and n a m e is sim ply th e n am e o f a co m p u ter program ; th e p ro g ram processes th e p a ra m e te rs, so their in te rp re ta tio n is its ‘in te rn a l affair’. (B u ilt-in shell com m ands do no t c o rre sp o n d to p rogram s, th e p a ra m ete rs of such com m ands are processed by th e shell itse lf.) The language also has c e rta in operators (o p r), which can be prefixed to a n y com and line. T h e y corresp o n d to program s th a t ru n th e rem ain in g co m m an d lin e, and p erfo rm som e uniform com putation in th e m e a n tim e .1
T h e inform al sy n tactic and sem antic d escrip tio n of com m and lines is available in th e form o f m a n u al pages provided w ith th e o p e ra tin g system . A m a n u a l page contains th e su m m ary of th e syntax asso ciated w ith a com m an d n am e follow ed by th e d escrip tio n of w h a t th e com m and lines do. Let us tak e a look a t th e sy n tactic descrip tio n o f th e com m and called g re p :
2 .1 . E x a m p l e
g r e p [ - b c h i l n s v y ] [ - f expfile] [ [ —ell expression] [hies]
F irst com es th e specification of the com m an d nam e, followed by the list o f Bags (Ü).
In th e case o f g re p , these are one-character strings th a t can be c o n c a te n a te d in any ord er a n d th e ir co n caten atio n m ust be p reced ed by a m inus sign. In g e n e ra l, we can th in k o f a flag as any strin g containing no blan k space a n d preceded by a minus sign. (F lags are in principle optional; in m an u al pages, [•] m eans o p tio n ality .) T h e n com e two options, each consisting o f a n option letter an d its argum ent. (An op tio n le tte r is like a flag, b u t it has an arg u m en t.) T h e o p tio n le tte r in th e second o p tio n is itse lf op tio n al. Finally, th e la st ite m is an optional argum ent (opt), i.e., a p a ra m e te r th a t has a fixed position in th e com m and line which is n o t preceded by an o p tio n le tte r. In fact, the above sy n tac tic su m m ary is th e a b b re v ia tio n of two different sy n tac tic possibilities:
2.1'. E x a m p l e
a. g r e p [ - b c h i l n s v y ] [ - f exp hie'] [ - e expression] [files]
b. g re p [ - b c h iln s v y ] [ - f exphle] [expression] [hies]
1 For exam ple, th e o p erato r tim e will re tu rn th e tim e th e process given as its a rg u m e n t has ta k en to run.
4 2. Unix shells In 2 .1 'a , we have nine flags, tw o options a n d a n optional argum ent; in 2 .1 'b , th e re are nine flags, one option a n d two o p tio n al arg u m en ts.2
In g en eral, the syntax o f th e relevant frag m en t of th e language of U nix com m and lines (L /cm|)) in B N F is as follows:
2.2. D e f i n i t i o n
1. cm l = opr cml \ emo | cml Ü | cm l o p t; 2. c m n =f c° I . . . I cm n+1 expr | cm n _ x opl;
Ó. o p t — expr;
4. exp r d= n | c° | . . . | var° | __
c” stan d s fo r n-a.rgument c o m m an d nam e c o n sta n ts, n stan d s for n a tu ra l nu m b ers, a n d cn s ta n d s for a n am e co n stan t d e n o tin g elem ents o f th e universe — files, directories, etc., as we w ill see. As one c an see from the definition, we assum e th a t flags a n d options com e a t th e end o f com m and lines ra th e r th a n betw een th e com m and nam e and its argum ents. T h is m odification does n o t m ake any difference ex cep t for the fa c t th a t the d escrip tio n of the sem antics of th e relevant co n structions will be far sim p ler. In w hat follows, we will n o t discuss th e sem antics o f most o f th e constructs specific for th e shell language; we will co n cen trate on th e sem antics o f commands.
The lan g u ag e presented above is a n id ealisatio n of th e cu rren tly available languages, as th e co n stru ctio n rules in th e given form are context free, w hereas in th e a c tu a l command la n g u ag e as specified in the m an u al pages c o n stru ctio n rules are sep a ra te ly given fo r every com m an d as can be seen from th e sy n ta x of th e com m and g re p above. It is obvious th a t , for exam ple, th e sy n tactic ru le th a t combines com m and nam es w ith flags is c o n tex t sensitive in th e sense th a t th e program w ill rep o rt a sy n tax e rro r if a flag is n o t explicitly listed in th e p ro g ram description. O n the one h a n d , it would be d esirab le to have a context free language as L(cm|) a n d , on the o th e r h a n d , it is m o re in line w ith o u r in tu itio n th a t if a m odifier com es from a closed syntactic class, b u t is not applicable in a c ertain context, th e n th is is a sem an tic, ra th e r th a n a syntactic phenom enon. It should b e explained in term s of sem a n tic in co m p atib ility or vacuous sem antic o p e ra tio n s ra th e r th a n in syntactic te rm s. In w hat follows, we will assum e th e above language a n d let o u r sem antic a p p a ra tu s b e such th a t it accounts for th e problem s connected w ith the relev an t constructions.
There a re also more im p o r ta n t problem s, related to th e com positionality of th e in te rp re ta tio n of com m ands. Besides th e fact th a t com m and nam es com e
2 The abo v e description is n o t quite co rrect, since exactly one of th e expfíle a n d expression argum ents is in fact obligatory.
2. Unix shells 5 w ith som e predefined sets of possible p aram eters (flags a n d o p tio n le tte rs), th e in te rp re ta tio n o f these also depends on th e com m and n a m e a t h a n d . For exam ple, th e flag -1 m ean s roughly ‘long, verbose listin g ’ in con n ectio n w ith th e co m m an d n a m e I s , 3 w hereas as an argum ent to wc it m eans so m eth in g like ‘count lines o n ly ’.4 Sim ilarly, while th e option le tte r - f (sta n d in g for ‘file’) in tro d u ces the n a m e of an auxiliary file (containing expressions or com m an d s) w ith g r e p an d sim ilar com m ands (make, awk, sed etc.), it is a flag th a t s ta n d s for ‘fo rce’ w ith th e co m m an d rm (rem ove), a n d has a to tally different effect.5
A second problem is th e issue of m ultiple flags. In g eneral, th e o rd e r of flags does n o t m ake any difference and m ultiple occurrences of th e sam e flag in one co m m an d cause th e sam e change in b eh av io u r as single o ccu rren ces, as one w ould ex p ect. Yet we have to face th e problem of dependent hags, i.e., th e pro b lem th a t c e rta in flags can only a p p e a r in th e presence of som e o th e r flag. For exam ple, th e flag - u dep en d s on th e presence o f - t in this sense w ith th e co m m an d n am e I s . 6 T h o u g h even the inform al sem antics m akes th is p erfectly u n d e rsta n d a b le , c u rre n tly th is is tre a te d as a sy n tactic co n strain t, w hich ag ain clearly does no t ag ree w ith o n e ’s intuition.
As a m a tte r of course, th e idiosyncratic b eh av io u r of flags can be explained aw ay by assum ing th a t flags are fu n cto rs over com m an d n am es as arg u m en ts.
3 I s -1
4 WC
5 rm
6 I s - t -u
lists the files specified by its argum ent in long fo rm a t, giving m ode, num ber of links, owner, group, size in b y tes, a n d tim e o f last m o d ification for each file. If th e file is a sym bolic link, th e filenam e is p rin ted followed by ‘-> ’ a n d th e p a th n a m e of th e referenced file. If th e file is a special file, th e size field will co n tain th e m a jo r a n d m i
n o r device num bers, ra th e r th a n a size. A to ta l co u n t of blocks in th e directory, including indirect blocks, is p rin te d a t th e to p of long fo rm at listings.
counts lines, words and ch aracters in th e n a m ed files, or in th e s ta n d a rd in p u t if no nam es ap p ear. It also keeps a to ta l count for all nam ed files. A word is a m axim al strin g o f c h a ra c te rs delim ited by spaces, ta b s, o r newlines. T h e flags -1 , -w a n d - c m ay b e used in any com bination to specify th a t a subset o f lines, w ords, a n d ch aracters are to be rep o rte d .
removes each given file. By default, it does n o t rem ove directories.
If th e - f (‘fo rce’) flag is used, it ignores n o n e x isten t files an d does n o t p ro m p t th e user if th e file is u n w ritab le.
so rts the files listed by last m odification tim e (la te st first) ra th e r th a n by nam e.
uses tim e of la st access in ste ad of tim e of la st m o d ificatio n for sorting;
can only b e used w ith th e - t flag.
6 2. Unix shells Since th e re is only a finite num b er of com m ands, th e m eaning of a flag could be a p a rtia l function defined pointw ise, i.e., one w hose actio n is determ ined by first looking a t its a rg u m e n t.7 A sim ilar issue is raised by th e ways in which th e presence vs. absence of o p tio n s a n d optional argum ents is significant. For exam ple, if th e com m an d s e t is followed by two argum ents (a nam e a n d a value), it causes th e variable nam e to be set to value, w hereas if it sta n d s w ith o u t an argum ent, th e corresponding action is to display th e currently set variables w ith th eir values.
This can again b e dealt w ith using several m a th e m atica l tricks such as polym orphic fu n ctio n s o r em p ty strings as arg u m en ts, defining th e fu n ctio n again pointwise.
O bviously, u n d er th e c u rren t w ording of th e P rin cip le o f C om positionality, a com positional in te rp re ta tio n of U nix com m ands can be given th a t uses only fu n c
tio n al a p p lic atio n ,8 alth o u g h we have th e very stro n g feeling th a t, un d er a m ore a p p ro p ria te view of com positionality, th is should n o t b e possible. In p a rtic u la r, th e h eterogeneous in te rp re ta tio n of flags (and o th e r o p tio n le tte rs) as well as th e heterogeneous beh av io u r o f ab sen t optio n al arg u m en ts are incom patible w ith ou r P rin cip le of Independence. In w h at follows, we will specify a sem antics th a t we feel com es closer to th e original idea beh in d com positionality a n d th a t will rem edy some o f th e problem s m en tio n ed above. We will see th a t this ty p e of in te rp re ta tio n will satisfy th e P rinciple o f Independence.
3. C o m p o sitio n a l U nix: A n Inform al O u tlin e
A nom alies like th e hom onym y of th e - f flag m entioned earlier should n o t occur in a U nix shell w ith com positional sem antics (an d th ey o ccu r to a very lim ited ex ten t in n a tu ra l languages). In a com positional U nix shell, th e re m ust be a flag
—f o r c e to b e used w ith rm (an d sim ilar com m ands)9, an d a different flag - - f i l e to b e u sed w ith g re p (a n d sim ilar com m ands). (N eedless to say, w hat nam e we choose for these flags is im m aterial.) T h e m eanings of — f o r c e an d —f i l e m u st
7 T h is m e th o d would give us a function th a t is as good as any o th er m a th e m atically. Even if we assum e th a t th e num ber of com m ands is infinite an d th a t th e function is to ta lly defined, we ju s t have to define th e result of th e a p p lic atio n of a flag to som e com m and for w hich it is undefined as th e action o f issuing some erro r m essage — again an action th a t m akes exactly as m uch sense as any o th er from th e m ath em atical p o in t of view.
8 F o r exam ple, th e m eaning of a flagged com m and is th e action it perform s.
C om p o sitio n ality in th e above sense is not even destroyed by th e fact th a t th e flag as a function does n o t necessarily preserve an y th in g of th e original action p e rfo rm ed by its arg u m en t.
9 As it is conventional, we will use — instead o f - to in d icate th a t som ething is a m u ltile tte r flag ra th e r th a n th e concatenation o f in d ep en d en t flags.
3. Compositional Unix: An Informal Outline 7 be assigned uniform ly and in d ep en d en tly of th e co n tex t. For e x am p le, - - f o r c e could b e in te rp re te d as ‘overw rite th e file argum ent if you own th e file, even if you do n o t have w rite perm ission for i t ’. (E ventually, it can also cover ‘do no t check if th e file arg u m en t exists at all’, alth o u g h it w ould be cleaner to se p a ra te these two m eanings, so th a t th e la tte r is to be expressed by, say, - - i g n o r e . ) Similarly, th e in te rp re ta tio n of th e option le tte r - - f i l e w ould be in te rp re te d as ‘th e nam e of an au x iliary file (containing com m ands etc.) follows’.
A ssum ing th a t th e program s corresponding to rm, g re p etc. o p e ra te as they u su ally do in U nix (i.e., th a t we are n o t to rew rite th em ), th e shell will in terp ret th ese p ro g ram nam es independently of th eir original in te rp re ta tio n (o r relying on th e original in te rp re ta tio n if needed). To achieve th is, we will a ssu m e th a t the shell m a in ta in s a lexicon which contains a program specifícation for each possible com m a n d nam e. P ro g ram specifications contain variables co rresp o n d in g to th e possi
ble effects o f p aram eters. For exam ple, th e value o f th e v ariab le W R IT E C H E C K d eterm in es w h eth er w rite perm ission is to be checked before o v e rw ritin g a file; th e v ariab le E X IS T C H E C K determ ines w h eth er th e non-existence o f a file will trig ger a special action; and th e value of A U X FILE sto res th e n a m e o f th e auxiliary
(com m and) file. If necessary, p ro g ram specifications assign default values to such variables, which can be overridden by param eters.
T h e p ro ced u re described above corresponds to a certain underspecifícation of th e a c tu a l effect of ru n n in g th e p rogram s. T he p ro g ram specifications will ensure th a t th e e x tern al context (th e so-called environm ent, a set of v a ria b le bindings) a n d th e (o b lig ato ry an d optional) p aram eters to g e th e r specify th e e x act action to tak e w hen invoking a program .
4. M ach in e S ta tes
To give a sem antics for th e language of Unix com m ands, we assu m e th a t th e relevant basic dom ain is th a t of machine states (MS). For th e sake o f simplicity, we will rep resen t a m achine sta te w ith th e disjoint union o f a ty p e d directed acyclic graph (T D A G), stan d in g for th e directory structure a n d th e files stored, a n d a dom ain NC_l d= N ® Char* for the d e n o ta tio n of th e n a tu r a l num bers a n d c h a ra c te r strin g s,10 form ing th e universe o f in te rp re ta tio n , a n interpretation function a n d a valuation corresponding to th e environm ent. In th is section we will m ain ly b e concerned w ith th e g rap h s belonging to a m achine s ta te , th e o th e r tre e com ponents will be explained in d etail in section 7. A ty p e d d ire c te d acyclic g ra p h is defined as follows:
10 T h e ex act m eaning of th e above n o ta tio n will be defined la te r, cf. defini
tio n 5.4.
8 4. Machine States
4 . 1 . D e f i n i t i o n
1. Given strin g s v and u, v is a prefix of u def3w.u = vw.
2. A tree domain D is a non-em pty subset o f strings (tree addresses) in N*
such th a t:
a. for each u E D, ev ery prefix of u is also in D\
b. for each u E D, for every i E N* if ui E D then, for every j such th a t 1 < j < i, u j is also in D.
3. Two tre e addresses a re independent if n e ith e r is a prefix of th e other.
4. A tree address u is term inal <£> there is no tree address def v in D such th a t u is a prefix of v.
5. Given a set T of types a n d E = l J reT E r of labels, a typed tree is a to ta l fun ctio n t? : D —> E , w h ere D is a tree dom ain.
6. A typed directed acyclic graph is an ordered p air (t r , R), w here t ? is a typed tre e and R is a n equivalence re la tio n on D (dom(ÍT’)) such th a t for all u , v E d o m (tr), if (u , v) E R, then:
a. ui E dom(ÍT’) & vi E dom (fy);
b. ui E dom (fy) =4- (u i , v i) E R; c. t T {u) = t T (v).
N o t all T D A G ’s are acc e p tab le in m achine states. In o u r case, th e TD A G a sso c iate d w ith N C j_ , an in te rp re ta tio n function a n d a valuation h as som e fu rth e r sp ecial p ro p erties, as shown by th e following d efinition.11 We su p p o se th a t T — { d ir,file , Char*}, i.e., th e relevant types are directory, file a n d character string.
4 . 2 . D e f i n i t i o n
(td a g © N C i , p, v) E MS 44 tdag = (tr, R) is a TD A G , a n d 1. p: Con dom (i'r) © N C _l
2. v: Var i—>• d o m (fr) © N C ^
3. t T {u) E E dir =>• Vi € N . f r (u i) E Edir V t T (ui) E E fiIe;
4. í t(u) E Efiie t r ( u 1) E Echar* A —*3» E N \ { l} .u i E d o m (f r);
5. tr{u) € Echar* ->3i E N .ui E d o m (ir);
6. í t(0) € S d;r;
7. 1,11, 111 € dom(<T), *t( 1) € Ed|r, í t( H ) € 53fiie, f T ( H l ) € Elchar*, and ->3i E N . l i E dom(Í7’) V I l i E d o m ^ ) V l i E dom (fx)-
T h e above definitions fo rm u la te the following constraints on w h a t ordered trip les o f universe, in te rp re ta tio n function and valu atio n we accept as m achine
11 T h e identity o f th e labels does n o t play any role in w hat follows. © in clauses 1 a n d 2 m ean s roughly th e disjo in t union of th e two dom ains. A lthough th e d o m a in consists of the d isjo in t union of a T D A G and N C _l, we are only in te re ste d in th e disjoint u n io n taken w ith th e dom ain o f th e T D A G , as th e su b seq u en t clauses show. F o r th e exact definition, see definition 5 .4 .
4. Machine States 9 sta te s p ro p er. T h e in te rp re ta tio n an d th e v a lu atio n asso ciated w ith th e u n iverse are fu n ctio n s th a t assign e ith er a num erical value, a c h a ra c te r strin g or a tre e a d dress to a c o n sta n t or a variable of the language to be given in section 7, d ep en d in g on its ty p e, as we shall see. Furtherm ore, in an MS labels asso ciated w ith th e te r
m inal addresses of th e underlying tree have to b e of ty p e ‘d ir ’ o r ‘C h a r* ’, 12 i.e., em p ty directories or finite lists of ch aracters corresponding to co n ten ts o f files.13 We have to im pose som e fu rth er co n strain ts g u aran teein g th a t c h a ra c te r strin g s a re only im m ediately prefixed14 by files an d th e la tte r are im m ed iately prefixed by directories an d th a t files only im m ediately prefix one c h a ra c te r s trin g w hich im m ediately prefixes n o th in g . As the so rts form dom ains o f th e ir own, a d d itio n ally, í t h as to contain th re e special elem ents: _l_char* , -bfi]e a n d J_dir —- th e ir tree addresses are 1 ,1 1 and 111, respectively — , n e ith e r being th e prefix of an y o th e r tre e address. T hese will serve as the so-called b o tto m elem ents of th e ir resp ectiv e dom ains — as required by dom ain th eo ry (cf. sections 5 - 6 ) , b u t th e y will also be p u t to special use in o u r sem antics, as will b e explained la te r on.
We will provide th e com positional Unix com m and lan g u ag e w ith a so-called denotational semantics. T his makes it necessary to in tro d u ce som e co n cep ts before specifying w h a t th e dom ains of the sem antic values of th e various ex p ressions in o u r language will be.
5. D e n o ta tio n a l S em antics
W e will use d en o tatio n al sem antics — as worked ou t an d describ ed in S c o tt a n d Strachey (1971) — for th e description of th e relevant fragm ent o f a U nix c o m m an d language. To illu stra te th e basic points, let us take a look a t th e follow ing two program s:
5 .1 . E x a m p l e
F( n ) <= If n = 0 th e n n else F ( n — 1) G(n) «= 0
Obviously, th e two program s do quite different things. T h e p ro g ram F — on receiving a n argum ent n of type N — will recursively co m p u te a value, n a m ely th e value 0. P ro g ra m G, on th e o th er hand, will im m ed iately p ro d u ce th e sam e re su lt.
A lth o u g h we see th a t th e two program s p ro d u ce th e sam e o u tp u t on a p p ro p ria te 12 We will use the term s hie, directory a n d character string to refer to tree
addresses labelled w ith objects of th e a p p ro p ria te type.
13 As costum ary, we th in k of em pty files as containing th e em p ty strin g o f c h a r
acters, i.e., th e strin g of length 0.
14 Let u ,v G N *. v is an immediate prefix o f u 3i € N .u = vi.
10 5. Denotations! Semantics input, i.e ., they are equivalent u n d er th e stan d a rd set th eo retic in te rp re ta tio n of functions, co m p u tatio n ally they are as different as any two p ro g ram s can b e .15 T h e idea b e h in d d e n o tatio n al sem antics is exactly this: for m any purposes it is b e tte r if we c an a b stra c t away from accidental p ro p erties of program m ing languages and the realizatio n s of specific program s, so th a t we can reg ard p ro g ram s essentially as realizatio n s of some (set theoretic) fu n ctio n s on dom ains a p p ro p ria te for w hatever can serve as th e in p u t a n d th e o u tp u t in th e language u n d e r investigation.
B u t th in g s are m ore com plicated th a n they seem a t first sight. If we in te rp re t the fu n c tio n s to be of ty p e / : N i-> N , we have no problem s. B u t w h at h ap p en s if we let th e ir type be / : Z H¥ Z? T h e p ro g ram G will still pro d u ce 0 on every in p u t.
But F is in trouble as w hen it is given som e n < 0 as an arg u m en t, it will go straig h t in to an infinite loop. W hy is th a t a problem for o u r sem antics? Because we have to do som ething ab o u t th e infinite loop, and th e sem antics th a t we chose forces us to give a d e n o tatio n to th is resu lt — a d en o tatio n th a t can a p p e a r as values o f functions. A dditionally, it has to be of type Z to m eet th e constraints.
For th is p u rp o se we in tro d u ce a special con stan t in every dom ain, called bottom
( - L ) .
F u rth erm o re , we will need an ordering which roughly m irro rs th e relations of in fo rm atio n content o f th e elem ents of th e dom ain. T his gives us an algebraic stru c tu re called a S co tt domain. T h e official definition o f S cott dom ains is as follows:16
5.2. D e f i n i t i o n
sd = (U,T sd , E ) € SD Hpf& U 7^ 0 , T s d € U,E a cpo, a n d Vx € U.T sd E x -
E xam ples a re th e dom ains N x and T x , i.e., the dom ains of n a tu ra l num bers and tru th values w ith th e ir respective b o tto m elem ents. These dom ains are also exam ples of a n o th e r im p o rta n t notion, th e so-called üat domains, defined as follows:
10 In w h a t follows, we will use th e term s extensions1 equivalence vs. intensional equivalence: F a n d G are extensionally, b u t n o t intensionally, equivalent.
16 U is th e universe o f th e dom ain containing a t least Xstj, th e inform ation c o n ten t of which is m inim al according to the complete partial ordering E- A cpo is a po which h as lim its |_|n x n for all (countable) increasing sequences Xq E X\ E ■ • • E x n E • • •• C ertain fu rth e r conditions on dom ains are im posed in G u n te r and S cott (1990), b u t th ese need n o t concern us here, as th ey are m e a n t prim arily to ensure th a t th e class of dom ains a re closed u n d e r various co n stru ctio n s.
5. Denotational Semantics 11 5 .3 . D e f i n i t i o n
sd € FD Vx, y 6 U.x / ± sd A y ^ ± sd => x £ y .
It is obvious th a t if we take the ordering to b e ab o u t th e in fo rm atio n co n ten t of the elem ents of th e respective dom ains, th e n n eith er _L C T , n o r T C _L, i.e., n e ith e r tr u t h value carries m ore in fo rm atio n th a n th e o th e r, w h ereas lack of inform ation a b o u t a tr u th value certain ly carries less in fo rm atio n th e n th e y do an d , sim ilarly, no n a tu ra l num ber is less inform ative th a n any o th e r, e x ce p t for th e b o tto m elem ent representing th e ‘re s u lt’ o f n o n -te rm in atin g c o m p u ta tio n s.
If we tak e some previously given dom ains as basic, all o th e r d o m ain s can be defined using c ertain operations on dom ains. T hese o th er dom ains in clu d e fu n ctio n dom ains, p ro d u c t dom ains and sum dom ains. Som e of th e relevant o p e ra tio n s are defined below :17
5 .4. D e f i n i t i o n
• d\ —>■ d2 th e dom ain of all fu nctions from d\ into d2, w here / g Vx e d i . f ( x ) C d2 g(x).
T h u s L d 1->d2 is th e fu n ctio n th a t m aps every elem ent o f d\ into -f d2 i
th e C artesian pro d u ct dom ain w here
( x j , X 2 ) ^ d i X d 2 (2/l 5 2/2 ) ^4* Vz £ { 1 , 2 } . X i 2/i)
th e ‘coalesced’ sum , w here elem ents o rig in atin g fro m different di's are incom parable a n d b o th J_dt are identified w ith JLdl0 d2;
th e lifted dom ain o b ta in e d by adding a new b o tto m elem ent u n d e r d;
th e lists of finite len g th — including strings o f le n g th 0 — w ith n o n -T com ponents in d.
T h ere are two m ore notions th a t are im p o rta n t in th e th e o ry o f d o m a in s as 17 d \ , d2 d en o te a rb itra ry dom ains. T h e s ta n d a rd function space is th e sp ace of
continuous functions. Continuous fu n ctio n s are defined as follows: A fu n c tio n / is continuous iff
/ (U x„) = U / ( x n ).
T his n o tio n is im p o rta n t from a technical p o in t of view, as th e re are n o n -triv ia l dom ains (th e so called reflexive dom ains) w hich satisfy th e follow ing eq u atio n : d = d —^ d an d can serve as th e d e n o ta tio n of some special c o n stru c ts , b u t th is will n o t concern us fu rth er in th e p a p er.
• d\ x C?2
• d\ © d2
• d±_
• d*
12 5. Denotational Semantics
well as in w h a t will follow:
5 .5 . D e f i n i t i o n
1. A function / is m onotone def f ( x ) E f{y)- 2. A function / is strict /(_L) = _L.
These p ro p e rtie s are defined for functions on dom ains b u t there is a very in tu itiv e analogy w ith com puter p rogram s. T h e first p ro p e rty is one we generally expect co m p u ter program s to satisfy, nam ely th a t th e y respect th e richness o f th e in p u t,18 i.e., a n in p u t th a t is ric h e r — according to some obvious ordering — is never taken in to a n o u tp u t th a t is poorer th a n th e o u tp u t for some po o rer in p u t. T h e second p ro p e rty is less obvious, b u t for p ro g ram s it m eans th a t we can n o t design a pro g ram t h a t saves us if it is given some erro n o u s in p u t, e.g., if its in p u t is provided by th e o u tp u t of some p ro g ra m th a t does n o t te rm in a te — as would b e th e case if we gave th e o u tp u t of p ro g ram F in 5 .1 on in p u t —7 as th e input to itse lf19. If we give th e above o u tp u t as a n input to th e p ro g ram G in 5.1, th en its b eh av io u r depends o n w hether we su p p o se it to o p e ra te call-by-value or call-by-name. In th e form er case, we get the sam e result as above; in th e la tte r, we get a p ro g ram th a t is m on o to n e b u t not s tric t, since it assigns th e sam e value to every in p u t — th u s satisfying th e condition o f m onotonicity — , b u t it does not respect th e b o tto m elem ent. Similarly, it is e asy to define a num erical p ro g ram th a t is s tric t b u t n o t m onotone — take one th a t takes every n a tu r a l num b er except T into som e n £ N b u t it ta k es some k € N in to n — 1 (a n d T into _L). T hus we see th a t th e two p ro p erties a re independent.
O ne m ore rem ark h a s to be m ade a t th is point. We said before th a t deno
ta tio n a l sem antics is u sed so th a t we can a b stra c t away from certain accid en tal p ro p erties o f program s, i.e., we can see extensionally equivalent p ro g ram s as h av ing th e sam e denotation. T h is will pose th e problem th a t certain p ro g ram s of th e U nix com m and language are extensionally equivalent, b u t they have different side effects th a t we may b e interested in c a p tu rin g . For exam ple, a p ro g ram th a t sim ply displays the co n ten t o f a file does n o t affect th e m achine s ta te in any ob vious way. So we can e ith e r take the decision to dro p d en o tatio n al sem an tics as o u r tool o r we can sim ply n o t take account of th ese features of p ro g ram s. B u t we can also try to m irror c ertain in ten sio n al differences — i.e., differences due to th e im p lem en tatio n of p ro g ram s th a t do n o t show u n d e r th e set th e o re tic al rep re sen tatio n b u t which we consider relevant — as extensional ones, th u s sticking to d e n o ta tio n a l semantics. In w h a t follows, we take th e la tte r p ath .
18 In o u r case, inputs a n d o u tp u ts will b e m achine states.
19 In o u r case this m eans th a t we can never recover from th e erro r s ta te .
6. The Semantic Domains 13
6. T h e S em a n tic D om ains
To m ake MS in to a Scott dom ain, we need a b o tto m elem ent ±ms a n d a cpo. The form er is th e u n s tru c tu re d e rro r state (JLm s); th e la tte r is defined as follows:
6 .1 . D e f i n i t i o n
1. Vi < v 2 &def V x.(ui(x) = 1 A v2(x) ^ T ) V (t>i(:r) = v2{x))\
2. m sj C Ms m s2 ^ m si = ±ms V
m si = {tdagi © N C x , Pi, Uj) (for i G 2) A tdagx = tdag2 A p\ — p2 A v\ <
V2-
T h a t is, th e e rro r sta te is less ‘in form ative’ th a n any o th e r s ta te , a n d w h ereas all o th e r sta te s w ith different underlying trees o r in te rp re ta tio n fu n ctio n s a re incom p arab le, in co m p arab le states the ordering is sim ply in h e rite d from th e ordering on th e v alu atio n , which says th a t a v alu atio n is m ore in fo rm ativ e th a n a n o th e r if an d only if it is ‘defined’ in some sense for m ore values.20
Now we are read y to define th e sem antic dom ains for th e language o f o u r Unix shell:21
6 .2 . D e f i n i t i o n 1. [n] G Nj_;
2. [cl G dom(<T );
3. [v a rj G dom (í-r) ® iV±;
4. [cm „] G Un -> . . . -> Ui MS -> MS;
5. [optjj 6 (MS -> MS) ->■ MS MS;
6. [f?l G (MS -4 MS) -» MS -> MS;
7. [oplj G (Un —>• . . . —t U\ —^ MS —> M S) —> U^n+i —t . . . —y Ui —y M S —y MS;
8. [cm ij G MS -> MS;
9. [o p r] G (M S -> MS) -> MS -> MS.
T h ere is little to say ab o u t th e dom ain of integers; c o n sta n ts will e v a lu a te to dis
tin g u ish ed nodes of th e tree, variables to nodes or n a tu ra l n u m b ers in accordance w ith th e ir types. C om m and lines (com m ands) will be in te rp re te d as functions from m achine sta te s to m achine states, w hereas n -arg u m en t co m m an d n a m e s yield com m ands w hen supplied w ith the a p p ro p ria te num ber of arg u m en ts. O p tio n s and
20 T h is is ju stified by th e fact th a t th e relevant in fo rm atio n is b asically stored in th e v a lu atio n function, w hereas th e underlying tree a n d th e in te rp re ta tio n fu n ctio n c arry little inform ation.
21 Cf. definition 2 .2 , Furtherm ore, we use th e convention th a t b ra c k etin g is right associative. For exam ple, X —y Y -* Z = (X —y ( Y —y Z)).
14 6. The Semantic Domains flags, like o p e ra to rs, are fu n c tio n s from com m ands to com m ands; nevertheless, we sh all see th a t th e re is a difference between o p e ra to rs a n d options/flags. O p tio n le tte rs create new argum ent places. By th e definition o f th e dom ains resu ltin g from coalesced sum , C artesia n p ro d u ct and fu n ctio n fo rm atio n 22, and th e flatness o f NCj_, Tj_ a n d dom (i;r), th e ordering re la tio n s an d th e b o tto m elem ents are given. For exam ple, the le a s t ‘inform ative’ p ro g ram (-1-m s->m s) is th e one th a t tak es every m achine state in to th e error sta te .
T h e in te rp re ta tio n of th e expressions of th e language L^cml^ will proceed via a tra n sla tio n fu n c tio n into th e lan g u ag e of specifications — th e topic of the following section. T h a t is, com m and lin es will be tra n s la te d into th e specification language first, th e n th a t language w ill b e in terp reted using th e sem antic dom ains defined here.
7. A L anguage for P ro g ra m S p ecifica tio n s
As we have said above, com plex expressions w ill receive a d en o tatio n in two steps.
F irst we define a tra n slatio n fun ctio n r : L^ cm^ (->■ L (spec), he., we tra n sla te ex
pressions of th e shell language in to expressions of th e language of specifications.
T h ese expressions will be g iv en a d en o tatio n v ia an in te rp re ta tio n fun ctio n a n d a valuation. As we shall see, th ese will be th e desired den o tatio n s of th e shell expressions. W e will proceed in two steps. W e first specify an auxiliary language I > s> an d a fu n ctio n rj: Z /cml) Z /ps) which will serve as th e basis for specifying th e language L^spec^ and th e fu n c tio n r.
C om m ands (cmi) will b e tra n sla te d in to p ro g ram specifications (P S ), which can be in te rp re te d directly in th e sem antics. T h e tra n sla tio n s of all o th e r expres
sions (such as flags and o p tio n letters) will b e given relative to PS. F irst of all we need a ty p ed dynam ic first-o rd e r language w ith equality (T D F O L E )23 th a t will b e sufficient to specify — i.e., to describe — fu n ctio n s from m achine sta te s to m achine states. T h e set of ty p e s is defined as follows:
7 .1 . D e f i n i t i o n
1. t, dir, file, natnum , char*, € T;
2. a , ß 6 T (a ß) G T .
T h e types dir a n d file are self-explanatory, t is th e ty p e tru th value — i.e., th e ty p e of form ulae — , natnum is th e type of natural num bers and char* sta n d s for c h a ra c te r strin g s, ( a ß) is th e ty p e of functions from o b jects of type ß to o b jects
22 Cf. definition 5.4.
23 T h e lan g u ag e and its sem an tics will be very sim ilar to th e one given in Groe- nendijk a n d Stokhof (1991) w ith some m odifications required by th e typing.
7. A Language for Program Specihcations 15 o f ty p e a . T h e ty p ed first order language based on th e above set T is defined as follows:
7 .2 . D e f i n i t i o n
1. l / ps) d= f (LCps, Con, Var, E xpr);
2. L C ps d= { ( , ) , = , A, 3};
3. C on d= ( J r gT C onr ; a. C ont d= { T } ;
b. C o n d ir d= { r o o t,± dir};
c. Conßle = { tty ,m a il, l file};
d. Conn a f;nurn — N ; e - C o n char* =f C h a r ;
f- C o n (n atn u m file) = {w rite.perm ission};
S- C on(char" file) =f {content};
h. C o n((Cjjar' char") char") = { }>
4. V a r d^ f U r e T V a r ^ U U r e T V a 4 ; a. V arsd ir = f {HOM E, C W D , dirx
b. V a rJ Je = {KBD, SC R E E N , f i l e i,...} ;Hpf
c. V arn a tn u m = f {W R IT E C H E C K , E X IS T C H E C K ,..
d. V ar^ = {x.c\ x £ Var^ A c £ Con*a/J^};
5. E x p r = f U reT E x p rr ; 6. C o n r U V ar“ C E x p rr ;
7. 4» £ E x p r(a ß),rj £ Expr^ => $(77) £ E x p ra ; 8. T7,C 6 E x p ra ^ p = ( e E x p rt ;
9. 4>, T £ E x p r, =>■ ->($), ($ A $ ) £ E x p rt ; 10. $ £ E x p r ,,£ £ V a rf => 3£.<f> £ E xpr,.
T h e c o n stan ts a n d sim ple variables of th e language serve to n am e th e elem ents of th e m achine s ta te s — i.e., files, directories, n a tu ra l n u m b e rs and c h a r a c te r strings
— in accordance w ith ou r requirem ents. O u r exam ples of special v ariab les are
‘H O M E ’ for th e u se r’s hom e directory; ‘C W D ’ for th e c u rren t w orking directory;
‘K B D ’ for th e c u rren t keyboard in p u t file ; ‘S C R E E N ’ for th e c u rre n t screen o u tp u t file, ‘r o o t’, ‘m a il’ and ‘tty ’ are special files a n d directories. T h e u se of the rem ain in g c o n stan ts an d sim ple variables should be obvious from th e ir sem antics th a t we specify la te r on. T h e fu n ctio n al co n stan ts a re again self-ex p lan ato ry , ex cep t for "" w hich is the symbol of concatenation. T h e d e n o ta tio n o f x ^ y is th e c o n caten atio n of (th e strings) x a n d y. We u su ally om it it, a n d indicate co n ca te n atio n by m ere ju x ta p o sitio n . V arc is th e set o f complex variables. The
16 7. A Language for Program Specifications v alu e o f a com plex variable d e p e n d s on its com ponents. T h e o p e ra to r is sim ilar to th o se o p e ra to rs of p ro g ram m in g languages w hich select a p a rtic u la r m em ber o f a structure. W e can think o f u n a ry nam e fu n ctio n s as selectors of m em bers of such stru ctu res. W e stipulate th a t
x.c 6 V arc x.c = c(x).
T h a t is, th e values of nam e fu n c tio n s applied to variables can be a u to m atically re fe rre d to by com plex variables. For exam ple, th e content o f th e file file can be re fe rre d to e ith er as ‘co n ten t(fiie)’ o r ‘file.content’. The o p e ra to r associates to th e left (i.e., x.c.d = (x.c).d). A p a rt from a n d the lan g u ag e itself is given by th e sta n d a rd construction ru le s for expressions of type r in a T D F O L E . In w h a t follows we w ill be especially interested in expressions o f ty p e t 24.
W e need c e rta in further o p e ra to rs defined in term s of th e above:
7 . 3 . D e f i n i t i o n
1. ($ V T ) d= A -iT );
2. (<E> -> T ) d= - ( $ A - T ) ; 3. !(*) d=* - ( - ( * ) ) .
T h e definition o f V and -> is s ta n d a rd , w hereas T is a u n a ry logical sen ten tial o p e ra to r, i.e., it ta k es formulae in to form ulae.25
T h e sem antic value of th e w ell-form ed expressions of the lan g u ag e in a m achine s ta te m s is p ro d u c e d via th e fu n c tio n [ - J m s . F ir s t we define a fun ctio n D th a t assigns sem antic dom ains to ty p e s, i.e., it specifies which kinds o f o bjects serve as th e d e n o tatio n o f expressions giv en th e set of m achine sta te s26:
7 . 4 . D e f i n i t i o n
1. D(t) d= P (M S );
2. D(file) d= { u :tT (u) £ £fiie}j 3. D (dir) = f { u :fT(u) € E dir};
4. D (natnum ) d= N x ; 5. D (cfiar*) d= Char*;
6. D ((a ß )) d= D(/?) —>• D ( a ) .
T h a t is, the d e n o ta tio n of a fo rm u la is a set o f m achine states, w hereas n am es of files, directories, n a tu r a l num bers a n d ch aracter strin g s evaluate to elem ents o f th e
24 In w hat follows, we will refer to expressions o f type t as formulae.
25 T h is is G roenendijk and S to k h o f’s closure o p e ra to r o.
26 Cf. definitions 4.1 and 4 .2 .
7. A Language for Program Specifications 17 a p p ro p ria te ty p e of th e universe — e.g., a file n am e ev alu ates to a n o d e of type file of th e und erly in g tree of th e tdag — w hereas fu n c tio n a l expressions evaluate to fu nctions of th e a p p ro p ria te type.
Now we are read y to define the sem antics of th e well form ed ex p ressio n s of the language L^ps\ F irst we give the definition of expressions o th e r th a n fo rm u lae:27
7 .5 . D e f i n i t i o n
1. c 6 C on =$> fcj d= p(c);
2. x G V ar3 =>• |x ] d= v(x);
3. x.c € V arc => [x.c] = f [c]([x]);
4. [E X IS T C H E C K ] d= n € 2;
5. [ro o tj d= 0 € d o m (tr);
6. I-Ldir] = f 1 € d o m (ir);
7. [-Lfiie] =f 11 € doin(tr);
8. [± C h a r* ]= f l l l € d o m ( < T);
9. [w rite.p erm issio n j € F —» 2, w here F C d o m( t r ) such t h a t =
^file •
T h u s the sem antic values of constants a n d sim ple variables are p ro d u c e d by the in te rp re ta tio n a n d valu atio n functions, respectively. T h e values of co m p lex vari
ables are d eterm in ed as was seen before. T h e re m a in in g clauses can b e regarded as c o n strain ts on v an d p. ‘E X IS T C H E C K ’ is a v ariab le th a t can o n ly b e set to 0 or 1 (th e sam e holds for ‘W R IT E C H E C K ’); ‘ro o t’ h as to denote th e ro o t of the T D A G . T h e n am e co n stan ts will represent ‘im m u ta b le ’ o bjects in th e m achine.
Som e of th em (especially ‘m a il’ and ‘t t y ’) will help us avoid co m p licatio n s in con
n ectio n w ith p ro g ram s th a t do not change a m achine s ta te u n d er th e s ta n d a rd in te rp re ta tio n (since norm ally we are only in te re sted in th e ir side effects): we con
ceive of th em as files th a t can grow indefinitely as strin g s are c o n c a te n a te d to their co n ten t (w hen m ail is sent or ch aracter strings are displayed, resp ectiv ely ). _Lr
27 We assum e th a t
Vxa .|x ]-Lms = _La , w here a G T \ {f}
i.e., th e d e n o ta tio n of all well-formed expressions except for fo rm u lae in the erro r s ta te is th e b o tto m elem ent of th e a p p ro p ria te type, as th is will not influence w h a t follows in any way. T h e definition below applies to all other cases. We will dro p th e superscript ‘m s’ a n d th e ty p e su b sc rip ts w h en this gives rise to no m isunderstanding. u[X] sta n d s for th e range of th e function v w hen c o n strain ed to th e set X .
18 7. A Language for Program Specifications d en o tes th e b o tto m element o f ty p e r; these a re ‘degenerate’ objects such as n o n ex isten t files; th e ir use will b e explained la te r on. ‘w rite_perm ission’ is a fu n ctio n fro m tree ad d resses to 0 or 1, th u s relating a tre e address of type file to its w rite perm ission.oo
T h e sem an tic value of fo rm u lae in a m a ch in e state will be the set of m achine s ta te s th a t can resu lt after th e form ula has b e en processed. T hus we specify the m eanings as sets o f ordered p a irs o f m achine s ta te s. The definition runs as follows:
7 .6 . D e f i n i t i o n
1. <E E x p rt . ( l MS, 1m s) € [$];
2. (m s im s 2) G [T | m s i = ms2;
3 . ( m s i , m s 2 ) G [ t i = t 2 j <=> m s i = m s 2 A [ [ i i J raSl - [ f 2 J m s i ; d e f
4. ( m s i,m s 2) G |p $ ] m sj = m s2 A ->3ms3.(m s i,m s 3) G [$J;
5. ( m s i,m s 2) G [$ A T ] 3ms3.( m s i, m s3) G |$ ] A (m s3,m s x) G ['kJ;
6. (m s1, m s 2) G [3x.<k] td ag j = td a g 2 A p\ = p2A
A 3 m s3.(tdag3 = td a g j A p3 = pi A u3[x]vx A (m s3,m s 2) G f$ ]).
C lau se 1 s ta te s th a t the e rro r s ta te verifies every form ula an d no form ula can recover from it. The form ula T denotes th e diagonal relatio n on th e set MS, i.e., it is always tru e w ithout a n y dynam ic effects. The rem aining clauses are th e s ta n d a rd ones fo r DPL, th o u g h clause 6 looks a b it m ore com plicated, b u t th is is th e only clause introducing d y n a m ic effects, a n d it sim ply says th a t we are only in te re sted in changes of the v a lu a tio n fu n c tio n 28 29 if this leads to a v alu atio n th a t can serve as a n in p u t to th e em b ed d ed form ula. This justifies w hat we said above, n am ely th a t th e denotation o f a form ula in a m achine sta te is a set of valuations.
Now it is easy to com pute th e sem antic clauses for th e defined operators:
7 . 7 Facts
1. ( m s i,m s 2) G [$ V T ] <=>
m si = ms2 A 3 m s3.(m s i,m s 3) G [$ J V (m sa,m s 3) G [T];
2. (m s1,m s 2) G [$ —* T ] m si = m s2 A Vms3.(m s i,m s 3) G [ $ ] =£■
3m s4 .(m s3,m s4) G [T ];
3. ( m s i,m s 2) G |!$1 ^ m s i = m s2 A 3 m s3.(m s i,m s 3) G [$J;
28 We are m a k in g unforgivable sim plifications here. A m ong others, we sim ply ignore th e difference b etw een character files an d special files (such as character devices); also, we ignore o th e r types o f perm issions alto g eth er (norm ally th e perm issions o f a file are en co d ed in fo u r o c tal digits in th e file system ).
29 Vi[x\v2 m e a n s that th e tw o valuations a re th e sam e except p erh ap s for th e value th e y assign to x.
7. A Language for Program Specifications 19 As for th e first two definitions, there is little to say. In th e case of clau se 3. it should be now obvious why G roenendijk an d Stokhof call it th e closure o p e ra to r:
it closes off any d y n am ic effects a form ula m ay have h ad . Now we have a D FO L E th a t h as enough expressive power to describe relatio n s betw een m ach in e states.
We will use this language to specify th e sem antics o f p ro g ram s. B ut we h a v e to face two fu rth e r problem s. T h e den o tatio n of a form ula is a p a rtia l re la tio n , i.e ., it is n e ith e r fu n ctio n al n o r com plete. B ut we th in k of p ro g ram s as total fu n ctio n s from m achine states to m achine states — i.e., program s are defined ev ery w h ere, and th ey are determ inistic. T h is m eans th a t n o t every fo rm u la o f th e above language is a p p ro p ria te as a tra n sla tio n of a program . To single o u t th e class t h a t we need, we will in tro d u ce a rep resen tatio n for th e form ulae a n d im pose th e relevant co n strain ts on th is rep resen tatio n , which is basically a s h o rth a n d for th e form ulae of
8. P rogram S pecification s
W e will take th e form ulae th a t represent th e tra n s la tio n s of o u r p ro g ram s a p a r t a n d give th e m a rep resen tatio n in term s of th e ir p a rts . T h e sen ten ces of th is re p re se n ta tio n will be the ones of Z /ps\ b u t we will n o t use all th e p o w er of th is language. B u t now we will th in k a b o u t this language as an o rd in a ry typed first o rd e r language w ith equality w ith its s ta n d a rd sem antics. Two sen ten ces of th is new re p resen tatio n will play a key role in specifying program s. T h e first one, w hich we will call th e precondition (PC) of th e p ro g ram , will co n tain th e input co n d itio n s for th e execution of a program ; th e o th er, called th e m axim al change ( MC) , specifies its o u tp u t conditions. T h e in ten d ed in te rp re ta tio n is as follows:
a fo rm u la (j) is applicable to a m achine s ta te ms — i.e., m s G d o m ([</>]) — if and only if th e m achine s ta te satisfies all sentences in th e p ro g ra m ’s P C ,30 a n d if a p ro g ram is not applicable to a m achine s ta te , we will take it to have no effect.31 T h is is basically th e sam e behaviour as th a t o f s ta n d a rd shells, w here a n error m essage is issued in such a situ atio n , b u t th e m achine s ta te is no t affected. The only way a p ro g ram can lead to th e erro r s ta te is by lead in g o u t of th e set of m ach in e states, e.g., by rem oving one of th e o bjects req u ired by d e fin itio n 4.1.
T h e m ax im al change b ro u g h t about by th e p ro g ram is th a t sentences in th e MC of th e p ro g ram are satisfied by the new m achine sta te , a n d all o th e r sen ten ces not
30 W e take this to m ean th a t all form ulae in th is com ponent are satisfied by the m achine s ta te u n d e r some ap p ro p riate first-o rd er d efinition, i.e., m s [= T V7 G T.m s 7.
31 W e do th a t in o rd e r to get com plete functions in accordance w ith th e re
q u irem en ts of definition 6.2. T h e general idea is th a t we explicitly lis t the p resu p p o sitio n s im posed by a pro g ram on th e in p u t m ach in e states.
20 8. Program Specifications
affected by M C retain th e ir t r u t h value.32
In a ctu a l fa c t, program specifications w ill be m ore com plex. F irst, th e P C will n o t be checked against th e in itia l m achine s ta te directly, b u t a m odified m achine s ta te , in w hich some variables are assigned local values for th e execution of th e program . So each program specification w ill contain a com ponent describing a m odification o f th e valu atio n o f the in itial m achine sta te . W e will call this com p o n en t the local environment (L E N V) of th e program . T h e role of LENV is th a t we do not e x p e c t the in p u t m ach in e sta te to verify it, no r do we w ant it to live on in th e o u tp u t m achine s ta te , unless as a consequence of som e p ro p erty of th e M C in th e p ro g ram specification. Second, since M C is ju s t a sentence in a F O L E , we have to keep a separate c o m p o n en t describing the dynam ic aspect of th e change o f state effected by the p ro g ra m , i.e., th e list of those variables the sem antic value o f which m ay change from th e input s ta te to th e o u tp u t s ta te (th ro u g h changes in th e v a lu atio n ). We will call this com ponent th e environm ent change (E N V C) th a t the p ro g ra m can effect.
So p ro g ra m specifications will be q u ad ru p les of th e form (LEN V ; PC; M C; ENVC),
w here LEN V 6 Var -> (V ar U Con U {*}) (where V rep resen ts th e undefined funcion value). We will u se th e n o tatio n m s + LENV to refer to th e m odified m achine s ta te w hich differs fro m ‘m s’ in its valuation only, an d
* ± £ = L E N V (x) =* [ x l ms+LENV = [ e r s.
O n the o th e r h a n d , ENVC C V ar. As a m a tte r of course, if a variable is in ENVC th e n , even if L E N V assigns it a local value, its old value is n o t resto red after th e com p u tatio n .
The com p o n en t called M C does no t u se th e full force o f our language Z /psA T h is is due to th e fact t h a t th e o p eratio n o f a pro g ram is to be deterministic.
Therefore, a sentence in M C d oes not co n tain negation: th e re m ay b e several ways o f falsifying a form ula. (In th is way, we also exclude conditionals an d disjunctions, w hich also le a d to non-determ inism , because they are defined in term s o f negation.) A n o th er p ro b lem atic type o f sentence in o u r FO L E is equality: th ere are two ways o f verifying th e equality o f tw o variables, nam ely, th e v a lu atio n of eith er one (or b o th ) can b e m odified in o rd e r to make th e ir values identical. Accordingly, we will stip u late t h a t a t most one v ariab le on e ith e r side of an eq u ality is in E N V C , and all variables o f ENVC a p p e a r in some eq u ality — otherw ise we could change th e m achine s ta te arb itrarily w ith respect to th e variables in EN V C b u t n o t in M C.
32 Except fo r those changes th a t MC en tails, of course.