Attribute manipulation instructions - A MODEL OF COMPUTATIONAL MORPHOLOGY AND ITS APPLICATION T

Examples 1. ;;Cfin; #add the property Cfin to rp

2. phon:/$C2$/;;DIG; #mark it digraph-final if it is 3. Vfin&&!ifin&&!rr:vST;;=jA =i;

4. else rr:(SVS|vST|VZA);;!=jA !=jAi;

5. humor:/MNa/||seg:/.?bb$/;;non grad&&no s´Ag;

6. rp:s/VHVB/VHFU/;Cini;;

Description The instruction consists of the following fields (delimited by semicolons):

<conditionals>; <requirement setting>; <property setting>; In examples 1–5., there is no requirement setting (note the two semicolons;;in the middle).

In example 1., the conditional part is also missing (note the two semicolons;;at the beginning of the line). In example 6., the property setting is missing (note the two semicolons;;at the end of line).

The conditional Example 2. shows that in the conditional part you may check a property of an attribute: here a regular expression (/$C2$/) is matched against the value of the attributephon. The attribute to use for checking isrp(=‘right properties’) by default (in example 3.,rpis checked for the presence of theVfinfeature and the absence of theifinfeature (expressed by the negation operator!, see below)). If you want to check a different attribute, you have to prefix the name of it with a colon to the checking expression. In example 3., the absence of thevSTfeature is checked inrr(expressed as!rr:vST). If the checking expression contains a regular expression operator (//) or a substitution operator (s///, see below), using the attribute name prefix is always mandatory (even if the attribute involved isrp, see example 6.).

Using Boolean opera-tors

You can check more than one condition and use the conjunction (“and”, &&) or the disjunction (“or”,||) operators between the checking expressions (examples 3.

and 5.). You can also use negation (marked by a! before the expression to be negated).

You can not at present use parentheses in the Boolean conditional expressions.

(And thus negation may only appear before atomic expressions.) The reason for this is the following:

Not only expressions delimited by slashes//, but also the other checking expressions not containing slashes are implemented as regular expression matching. Example 4. shows that this is the case: (SVS|vST|VZA) is really a regular expression containing the regular expression disjunction operator | (this is a single | in contrast to the double||of Boolean disjunction) and the grouping operator() (i.e. parentheses). Since parentheses are part of the regular expression syntax, they are never considered to partition the Boolean conditional expression. (Note that the parentheses are in fact superfluous in the case of (SVS|vST|VZA).)

In the Boolean expression, negation (!) has the highest precedence and disjunction (||) has the lowest. ThusA&&!B||Cis interpreted as(A&&(!B))||C.

Note that you can always use regexp disjunction (|) instead of Boolean disjunction (||) if the two conditions to be checked refer to the same attribute. (In example 4.

the all refer to the attributerr(=‘right requirements’).) The regexp disjunction (|) has higher precedence than either negation (!) or (Boolean) conjunction (&&).

The evaluation of Boolean expressions

Only as much is evaluated of the Boolean expression as is needed for the determi-nation of its truth value: if a left conjunct is false, the right one is not evaluated (and the whole expression is false); similarly, if the left member of a disjunction is

true, the right one is not evaluated (and the whole expression is true).

Usingelse Example 4. also shows that the keyword else may appear at the beginning of the conditional. In that case, the whole instruction is executed only if the conditional of the previous instruction was false. This means that example 4. is never executed if example 3. is.

Using substitution in the conditional

Example 6. also shows that the conditional may even contain a sort of ex-pression that actually changes the value of the attribute to which it refers to:

rp:s/VHVB/VHFU/changes the first occurrence of the string VHVBtoVHFUwithin the value of the attribute rp. s/regex/subst/ is a regular expression based substitution expression, which changes the substring matched by the regular ex-pressionregex to the string given assubst. If you add a switch gto the end:

s/regex/subst/g, then not only the first matched substring is replaced, but all such substrings are (gstands for ‘global’ matching).

A.1.2.1 Manipulating properties and requirements of morphemes

1. Add left or right properties if certain conditions are met.

Examples 1. phon:/$V $/;;Vfin;#add the property Vfin to rp if vowel final 2.allomf:/ˆ$/;;lp:0mrf;#mark zero morphs

3.Vfin&&!ifin&&!rr:vST;;=jA =i;

Description The third field is normally used to set properties. The instruction affects the attributerpby default. Use a prefix to affect another attribute (lpin example 2.) You can add more than one property by giving a space separated list of them (example 3.).

2. Delete left or right properties if certain conditions are met Examples 1. rr:(SVS|vST|VZA);;!=jA !=jAi;

Description You can also delete properties fromrp,lporgpby preceding them by an! in field 3. If the condition in example 1. is met, the properties=jA and=jAiare deleted fromrp. (In fact, an! in field 3 can be used to remove a word from the value of any attribute, including the removal of requirements.)

3. Add left or right requirements if certain conditions are met Examples 1. rp:s/VHVB/VHFU/;Cini;;

2. lp:comp2;lr:!cat vrb;; #right compound members must follow a nominal (non-verbal) stem

Description You can also add requirements. Field 2 is normally used for this purpose. The attribute affected by the operation in field 2 is rr (=‘right requirements’) by default. If you want to add a requirement to another attribute (e.g. tolr), you must use a prefix (example 2.). When used in field 2, the! does not mark that the requirement should be removed but it marks the addition of a negative requirement (i.e. in contrast to field 3, the! is not treated specially; example 2.).

A.1.2.2 Manipulation of the values of other morpheme level attributes

1. Simple value assignment

Examples 1. ;;root:$seg;#the root is the same as seg

2. !phon:/./;;phon:$root; #phon defaults to be the same as the root (unless otherwise specified)

Description You can use field 3 for the purpose of simple value assignment. If you change the value of a non-list valued attribute (i.e. something other than the property or requirement list attributes: rp, rr, lp, lr, gp, glr, grr), then the effect is not the addition of the value to the value list, but simply the assignment of the given value to the attribute. The result of ;;root:$seg;is not the concatenation of the value of thesegattribute with that of the root, but the root attribute is simply assigned the same value as the seg attribute. Value assignment can of course be combined with condition checking (if given in field 1, see example 2.).

You can refer to the values of attributes as$attrif you declared them to be local usingmy attr;. Otherwise you must refer to them as$mrf->’attr’.

2. Regular expression based substitution

Examples 1. root:s/["?!#=%@ˆ(){}]|[<[].*?[]>]|\.\.\.//g;;;#remove special segmen-tation characters from root

2. seg:/=ik$/&&root:s/ik$//;;-ik;

Description Regular expression based substitution has the following syntax:

attr:s/regexp/subst/;;;(see example 1.)

You can use field 1 (i.e. the condition field) for this purpose. If substitution has a precondition, you can add that before the substitution expression using conjunction (see example 2.), and, since the right conjunct is not evaluated if the left conjunct is false, no substitution occurs if the precondition fails. The substitution expression itself is true if actual substitution occurs. Thus if the attribute manipulation instruction also specifies e.g. the addition of a property (in field 3,-ik;in example 2.), this addition only occurs if both the precondition is satisfied and actual substitution occurs.

3. Character translation

Examples 1. phon:tr/A-Z´A´E´I´O´U¨O¨U˜OˆU/a-z´a´e´ı´o´u¨o¨u˜oˆu/;;;#decapitalize phon Description The format of character translation is: attr:tr/fromchars/tochars/;;;

You can use field 1 (i.e. the condition field) for this purpose. Every occurrence of the nth character from between the 1st pair of slashes is replaced by the nth character from between the 2nd pair of slashes. You can use character ranges (as example 1. shows). A range is actually an ASCII code range, thus accented

characters must be listed explicitly.

All the remarks about the preconditions etc. described at the regular expression based substitution apply here as well. One important difference between character translation and regular expression based substitution is that in the case of the former the match and the replacement character lists are taken verbatim: variables are not interpolated into either of them.

In document A MODEL OF COMPUTATIONAL MORPHOLOGY AND ITS APPLICATION TO URALIC LANGUAGES (Pldal 153-156)