Invariant of type - 6. 5 Data types (Gábor Pécsy)

6. 5 Data types (Gábor Pécsy)

6.1.2.4. Invariant of type

The third element of the system ( ) is the invariant of the type. This property describes which sequences of elementary values represent a type-value. Typically not all sequences are valid, they have to satisfy certain internal constraints. In the example above we have seen that only those pairs of integer numbers were valid representation of a rational number where the second number - the denominator - was non-zero. We may further restrain the representation by demanding that the numerator and the denominator are relative prime or that the denominator must always be positive.

6.1.2.5. Operations

The third component of the system is the set of type operations. These operations are programs which state-spaces include the sequences of elementary values.

There is a close relationship between the invariant of the type and these type operations. If a program has an outbound parameter of a sequence of elementary values, the program - in its postcondition - must ensure that the resulting sequence satisfies the invariant. On the other hand, if the program has an inbound parameter of a sequence of elementary values, the program may require in its precondition that the sequence satisfies the invariant. Clearly, the implementation of the program depends both on the representation function and the invariant of the type. Often these two together are referred to as the representation of the type.

skip

6.1.2.6. 5.1.2.3 Types and specifications

We will now determine when a type is adequate to a type specification. The formal definition is the following:

The type is adequate to the type specification if:

• , and

• is a solution of through .

The first condition means that all sequences of elementary values which satisfy the invariant of the type represents a valid type-value, i.e. one that satisfies the invariant of the specification. Additionally, there is a valid representation for all type-values in the set .

The second condition is that for each specified operation there is a program in the type which implements it. The

"solution through " means that the input data, which is given in the state space of is transformed to state space of the solution program using the inverse relation of . There we execute and then the result is mapped back to the state space of using . If the result satifies , then is a solution to through , that is, it is a valid implementation of the specified operation using the given representation. In other words, a program is a solution through to if it modifies the sequences of elementary values in such a way that the interpretation of the results matches the expected result specified in the problem.

Notice that many types can be adequate to the same type specification, which may differ in the chosen representation, in the invariant of the type - see the example above - or in the implementation of operations. Our mathematical model only handles the functional correspondence between the type and the specification. Which realization of a type is appropriate in a particular application may depend on non-functional aspects such as execution time, memory footprint, etc.

6.1.3. 5.1.3 Type systems of programming languages

The objects our programs operate on have a type, which determines their behavior, the set of operations on them. Essentially, a type is a set of semantic rules, which specify what operations are permitted on the bit sequence representing the object. If we perform an illegal operation on an object, it is a clear semantic error in our program. One of the most important goals of programming languages is to help the programmer write

correct programs, to help discovering semantic errors early, preferably in compilation time.{Compilation time is a loose term in this chapter. It is used for interpreted languages as well. It refers to the time of syntactic verification of the program.} One of the means of performing semantic verifications in the programs is the type system and typedness of programming languages [Cardelli and Wegner, 1985].

Typedness means that programming languages assign a type to the entities (objects) used in the programs. All constants, operators, variables or functions have a type (in some languages other constructs can have a type as well). The type of expressions is determined using a type inference system. In some languages - for example Ada or Pascal - the type of symbols are specified in their declaration and the compiler verifies that the definition and the usage of these symbols is consistent. In other languages - for example in ML - instead of using explicit declarations, the compiler determines the type from the expressions as long as its possible and as long as consistency can be maintained.

In statically typed programming languages the type of all expression can be determined at compilation time.

While static typedness is a useful propery, it imposes too strict requirements against languages. Fortunately, these requirements can be loosened a bit while maintaining consistency. A programming language is called strongly typed if it guarantees that the type of all expressions is consistent, even if the exact type of the expression cannot be determined at compilation time. Clearly, all statically typed languages are strongly typed.

However, there are strongly typed languages which are not statically typed. Languages which support inheritance based subtype polymorphism cannot be statically typed though many of them - for example C++, Java or Eiffel - are strongly typed. Different variants of polymorphism are discussed further in details in Section 29 and Chapter 10.

In other languages the consistency of expressions can only be determined at runtime. Therefore, evaluating such expressions might cause runtime errors. Such languages are Smalltak, dBase, or many scripting languages.

These langauges are called weakly typed.

6.1.4. 5.1.4 Type conversions

When developing an application, we often need to change the type of some data. This operation is called type conversion, typecast or coercion. Depending on the programming language and the actual types, the conversion can happen automatically (coercion) or it can be explicit. There are different variants of conversion which are further described in the following sections.

6.1.4.1. 5.1.4.1 Changing representation

Type conversion can serve multiple purposes. One possible purpose is to change the data representation. As we have seen in previous sections, an object can have numerous different representations in memory. When solving a concrete problem, we try to choose the most suitable representation, the one that serves our goals the best.

However, some objects may have multiple different representation in our program. The simplest examples are numbers. Most modern programming languages offer some integer data types using two's complement representation and some real data types using floating point data representation. This means that for integer numbers, which are also real numbers, we have two distinct representations in these languages. Though the represented "real world" object is the same, the available set of operations depends on the chosen representation.

Therefore, to perform some operations we might need to convert the data from one representation to the other.

In some programming languages - such as Ada - these conversions must always be explicitly denoted. In other languages, some of these conversions can take place automatically. In such languages, when evaluating the numeric expression, the representation of the integer number is automatically converted to floating point representation and the addition is performed on floating point values. This conversion is called widening because we change from a narrower type-value set of integer numbers to a wider set, the set of real numbers.

Widening conversion happens when a 16-bit integer value is converted to 32-bit or a float value to double.

Many languages allow automatic widening conversions because it is safe, they can be performed without data loss. This is also the reason why the integer value gets converted in the expression above. Converting a floating point value ( ) to integer has the risk of loosing some precision, because it is a conversion to a more limited type-value set; it is a narrowing conversion.

The example below is a syntactically correct code snippet in C, C++, and Java as well.

double x;

int i;

x = 2.2 + 1;

i = 2.2 + 1;

When calculating the value of x, in all three languages a widening conversion is performed automatically.

However, the calculation of i requires two conversions. The first is a widening conversion required to evaluate the expression. As the type of the expression will be double, before the assignment to i, another conversion is required. This is a narrowing conversion from the floating point expression type to the integer type of the variable. This conversion is not always safe to perform. In the example, we loose precision when the original value of the expression is converted to the integer . In language C this narrowing conversion is automatic.

In C++ the conversion is automatic, but the compiler warns of the potential data loss during compilation.

However, in Java the second assignment is illegal and it results in a compilation error.

6.1.4.2. 5.1.4.2 String conversions

Many programming language offers means to convert various objects to String and convert Strings to objects.

Though they change the representation of the object, they are different from the representation changing conversions described above. String conversion does not only change the representation, they actually change the represented "physical" object. When the number is converted to a string, the result - at least in most programming languages - is not a different representation of the number but a string of characters - a piece of text - which consists of the characters '4' and '2'. Not only does the representation change but the represented object as well. Depending on the type and language, the conversion may be irreversible. Examples for such string conversions are the Image attribute of Ada or the toString() method of Java objects. In Ada the conversion to String is always explicit while Java permits automatic conversion as long as the result in not ambiguous.

6.1.4.3. 5.1.4.3 Changing the interpretation

The conversions that change the representation of an object transform the bit sequence that describes the object in memory. A different variant of type conversions leaves the actual bit sequence intact while changes its interpretation. While the former type of conversions results in some runtime operation, interpretation changing conversions are purely compile time constructs. Their purpose is to "calm down" or "work around" the type verification system of the compiler.

In object-oriented languages, the conversions between a subclass and a superclass is an interesting mixture of representation- and interpretation changing conversions. In languages like Java, where objects are stored by reference, the conversions between subclass and superclass are possible in both directions, provided that the dynamic type of the object (see Chapter 10) permits it. In practice it is an interpretation changing conversion as the bit sequences representing the object - both the reference and the referenced memory area - remain unchanged. However, if the objects are stored by value - for example, in C++ or the expanded objects in Eiffel -, then the conversion is only allowed from subclass to the superclass, and it involves changing the representation because the data members introduced in the subclass are truncated. Here the effect of narrowing and widening conversions is modified as well. Converting from subclass to superclass is a widening conversion, but it may result in data loss. The conversion itself is safe, but the narrowing conversion is not possible later.

6.1.4.4. 5.1.4.4 Type conversions in the languages

6.1.4.5. C

In C the type cast operator can be used for both the representation changing conversions between the different numeric types - though denoting these explicitly is not required - and for the interpretation changing conversions between the various pointer types and the int type. Any pointer can be converted to any other pointer or int as this only involves changing the interpretation of the representing bit sequence. However, they cannot be converted to floating points types because it would involve changing the bit sequence as well. C also permits

interpretation changing conversions between the constant and variable versions of a type as it leaves the representing bit sequences intact.

6.1.4.6. C++

In C++, for backwards compatibility, the type cast operator of C is available but its usage is not recommended.

Type casts in C are inherently dangerous, they are mostly interpretation changing conversions, which effectively disable the rather limited type safety of C. Using them requires careful consideration. Instead of this single multi-purpose cast operation, C++ has introduced a number of different type cast variants which are tailored for the different use cases of type conversion. This separation enables selecting the conversion operator which fits the required purpose the best.

• The static_cast operator can be used for conversions between related types, e.g. the pointer types of classes within the same type hierarchy, between numeric types or between integer and enumeration types.

• The dynamic_cast operator is also used for conversions between pointer types of classes within the same type hierarchy. However, at runtime when converting from the superclass to the subclass this operator verifies that the dynamic type of the referenced object also permits the conversion (see Chapter 10).

• The const_cast operator is used to convert a constant object to a non-constant object of the same type.

• The reinterpret_cast operator can be used to convert between types where the only relationship between the types is that they are represented by bit sequences of the same length. This corresponds to the original type cast operation of C. Important to note that while the other three cast variants are portable, the usage of reinterpret_cast may result in platform or implementation dependent code.

6.1.4.7. Java

In Java the conversion between scalar types is representation changing. In the case of widening conversions, it is automatic, while in the case of narrowing conversions, it needs to be explicitly denoted. Between reference types the conversions are limited. From subclass to superclass the conversion is automatic. Conversion from a class to the interfaces it implements is also automatic. The reverse conversions are possible but they include runtime type checking. During the check the Java Virtual Machine verifies that the dynamic type of the referenced object is assignable to the type it is being converted to. These conversions need to be explicitly denoted in the code and if the type checking fails, a runtime exception will get thrown.

Since Java 5.0 there is also a conversion between scalar types and their corresponding Java wrapper type. This is an automatic representation changing conversion called autoboxing.

6.1.4.8. Eiffel

In Eiffel, variables can store objects either by reference or by value. When storing objects by reference, the variable only contains a pointer to the area in memory where the attributes of the object are stored. When the object is stored by value, the variable actually contains the attributes of the object. These objects are called expanded in Eiffel. The rules of type conversions are essentially the same as described in the introduction, but their effect depends on how the object is stored. Conversion from subclass to superclass is automatic, but in the case of expanded objects, it results in data loss; data members - called features in Eiffel - introduced by the subclass are truncated. Conversion from superclass to subclass for expanded objects is not permitted. For objects stored by reference, it is possible to convert from the superclass to subclass provided that the dynamic type of the object is really assignable to the subclass. This is done by using the reverse assignment attempt operator. If the dynamic type of the object is not assignable to the left-hand side of the assignment, the value of the variable will be void.

6.1.4.9. CLU

In CLU, there is a very special interpretation changing conversion. When creating a new abstract data type, the type system of CLU differentiates the abstract type from the concrete type used for its representation, and mandates an explicit type conversion between them. For further details see Section 9.5.3.

6.1.4.10. Ada

Finally the conversion between the base type and the derived types in Ada is an interpretation changing type of a conversion as the derived type in Ada has inherited the representation of the base type. Using conversion it is possible to create connection between the derived types of the same base type (for further details see Section 5.8.1).

6.2. 5.2 Taxonomy of types

This section will provide a classification of types based on their structural properties.

On the highest level of the taxonomy there are two classes. Primitive types, which are logically atomic, and have no identifiable parts, and composite types, which are constructed from other, already existing types. This chapter will focus on the former, that is the primitive types. Composite types are discussed in Chapter 6.

Primitive types can be divided into two classes. Pointer types are the abstractions of memory addresses, and are discussed in details in Section 5.6; scalar types, on the other hand, represent simple atomic values or quantities.

Scalar types can be grouped into real and discrete types. The former category contains the various representations of real numbers. The two most common variants are floating point and fixed point representations. Discrete types can be categorized as enumeration and integer types. Figure 9 gives an overview of this type taxonomy.

6.2.1. 5.2.1 Type classes

Type classes defined in the taxonomy are not just of theoretical relevance. They often manifest themselves in the programming languages, and as they belong to a type class, it determines the usage of the type. Common constraints are that the loop variable of a for loop must be of a discrete type, or that arrays can only be indexed by discrete, or sometimes by integer types only. Type class can also determine the set of operations for the given type. Availability of some operations might be determined based in their belonging (or not) to the corresponding type class. When a new type is defined within the class, these operations become available automatically.

6.2.2. 5.2.2 Attributes in Ada

Type classes are very important in Ada. Each type class has a set of operations, which are implicitly defined for each type in that type class. Part of these operations are operators, e.g. in Ada each scalar type is ordered and

relational operators (<, <=, =, etc.) are implicitly defined for them. The other types of operations are the so called attributes. Attributes are special type class specific operations which are of three different kinds:

• There are attributes which are simple properties of the type. For example, for a scalar type S the attributes S'First and S'Last are the smallest and largest value of the type.

• Other attributes are operations of the type. These attributes have an object of the corresponding type in their signature, either as the type of an argument or as the type of their return value. Typical examples are the conversion operations. For any discrete type T the attributes T'Pos and T'Val are defined. These attributes are functions. T'Pos maps elements of T to their ordinal number, while T'Val is an inverse operation which returns the type-value for the specified ordinal number.

• The third group of attributes is an interesting mixture of the previous two. Mixture, because they operate on

In document Advanced Programming Languages (Pldal 127-142)