Hashtables in Perl - 7. 6 Composite types (Gábor Pécsy)

7. 6 Composite types (Gábor Pécsy)

7.8.1. Hashtables in Perl

Assume that we want to change our student management system and wish to identify students by their names instead of their indexes. We could continue using an array and we could now consider creating our own selector or find operation which could in turn be used to access the data of a student based on his or her name. However, this solution would have an impact on our system's performance. Finding a student by his/her index in an array takes {The big O notation is the usual measure of complexity. time means that the time needed to perform the task does not depend on the number of elements in the array. complexity means there is a linear relation between the time needed to complete the task and the size of input (for example the number of elements in the array).} time, while finding them by name takes .{If the array is sorted by the names, we can find the student in but then adding a new student to a class will be more expensive.} Hashtables

are associative data structures which store key-value pairs and support efficient retrieval of data based on its key.

The operations of hashtables are similar to the operations of arrays with some differences. For example, the selection operation takes the key as an argument rather than an index. As the set of keys is not necessarily ordered, there are no "key intervals"; therefore, slicing is not supported. To improve access to the stored data, most hashtable realizations provide support for enumerating the keys used in the table.

Programming languages usually support hashtables as part of their standard libraries, such as Hashtable class in Java. However, in Perl the language itself offers a hashtable composite type. The language restricts the keys and values to scalar types but treats strings as scalar. The following simple example (taken from [Till, 1996]) counts the frequency of words in a text file.

#!/usr/bin/perl

while ( $inputline = <STDIN> ) {

while ( $inputline =~ /\b[A-Za-z]\S+/g ) { $w = $&;

$w =~ s/[;.,:-]$//;

$words{$w} += 1;

} }

print("Words and their frequencies:");

foreach $w (keys(print("$w: $words{$w}\n");

}

Curly braces - and - are the selection operator. Notice that when selection is used, the name of the hashtable is prefixed with the dollar sign ($). In Perl this indicates that the following expression has scalar value.

However, when referencing the hashtable as a whole, the name of the table is prefixed with a percent sign (%).

If no value was assigned to a particular key, it has a special "empty" value. Values are set or updated by a simple assignment. An existing value is removed using the delete operation.

Perl allows assignment of hashtables which means copying the whole content of the table. Another important operation is keys which provides the list of keys used in the hashtable. This can be used to enumerate the data stored in the table, e.g. when printing the result in the example above.

7.9. 6.9 Summary

This section summarizes the most important issues in the area of type composition that we need to answer when studying a new programming language. We also give possible answers or solutions to these questions by looking at specific examples from the languages listed in this book. However, answering these questions is not always easy, the familiar patterns do not necessarily apply to new languages. Most probably the list of questions below is not complete - the existing languages are developing and new constructs are being invented. However, these questions are a good starting point and help to shed light on the most important issues in type composition.

• What type equivalence does the language use? We have seen examples of strongly typed languages like Ada, which use a very strict equivalence definition - that is, name equivalence. Modula-3 represented the other extreme which requires structural equivalence only.

• Does the language support the creation of immutable types? In Java we can create immutable types by using the means provided by the language, but there is no explcit immutable type construct. In CLU all composite types have a mutable and an immutable variant. For the efficient use of immutable objects the language must support garbage collection and objects should be stored by reference.

• Does the language have cartesian product composite type? There are languages such as FORTRAN which do not support the cartesian product composite type. Other languages - e.g. CLU - offer multiple solutions. If the

language does not support cartesian products, does it offer any alternative? For example, Perl has no cartesian product type, but hashtables are a good substitute for them.

• How do the selectors of cartesian product types work? The most common form is qualifying the selector with the cartesian product object in the form , which designates a reference of the corresponding component so that it can be used as an L-value as well. However, for example in CLU, selectors are realized as a getter-setter pair of subprograms.

• Does the language support assignment of cartesian product objects? In most languages, the assignment of cartesian products is allowed, the default semantics corresponds to a component by component assignment of the values. However, in some languages - e.g. in the case of the limited types of Ada - it is possible to restrict assignment.

• Can we influence how assignment works? In C the answer is a simple no. In Ada we have the opportunity to restrict assignment on certain types using the limited keyword. C++ offers the most sophisticated solution where we can define the semantics of assignment by overloading the assignment operator.

• Is there equality check defined for cartesian products? Most modern languages allow equality check but for example C does not.

• Can we influence the definition of equality? As equality is an operator in almost all languages, if the language supports operator overloading, it applies to the equality operator as well. Java is an interesting case where the equality of references is an operator which cannot be overloaded, whereas the equality of objects is checked using the equals method of the class, which can be overridden by the developer.

• Does the language have a WITH-like construct? We have seen this operation in the Pascal-like languages. It can be used to simplify the complex qualified expressions.

• What means does the language offer to influence the physical representation of cartesian product types? Some languages offer no means at all, whereas other languages offer compilation directives, pragmas to specify alignment of components, to control the use of padding and other physical aspects of the representation. In Ada we can specify the details of representation bit by bit by using the representation clauses.

• Can we specify the default value of components of the cartesian product? We have seen this feature in Ada where the declaration of a record type can include default values for the fields.

• Does the language have cartesian product literals? We have seen aggregate notation of cartesian products in multiple languages. It is often used to initialize variables or defined constants. Naturally, this assumes support for assignment.

• Does the language support union type composition? Support for union is less frequent in languages than for cartesian products. If the language does not support it, it is worth to check what alternatives it offers. We have seen how inheritance and polymorphism can provide similar functionality in object-oriented languages.

• Free or tagged union? We have seen languages such as C where the union does not have a selector as defined in the type construct, and the language provides no means to verify the dynamic type of the union object. On the other hand, the content of a union of ALGOL 68 can only be accessed in a tagcase structure which ensures type safe usage.

• Does the language support arrays? Though in most languages the answer is yes, there are special cases like SuperNova which does not offer this construct.

• What are the allowed element types of arrays? There is large variety in this area. Arrays of FORTRAN can only store scalar values. Perl also restricts the element type to scalars, though in the case of Perl this includes strings as well. In Ada the element type cannot be an unconstrained type. Can arrays be elements of arrays?

• What types can be used for indexing? Arrays of C are indexed by integers from 0. CLU allows arbitrary index intervals, but the index type must be an integer. In Pascal-like languages typically any discrete type can be used for indexing.

• How flexible is the index interval? In most languages, the index interval is specified in the declaration of the array type. In C-like languages only the number of elements can be specified, and indexing always starts at 0.

In Java, array types have no fixed length, it can be specified when the corresponding instance is created.

Similarly, Ada allows the declaration of unconstrained array types where the index interval can be arbitrary, and it is specified when the instance is created. The arrays of Eiffel or CLU are even more flexible, the index interval can be modified dynamically, arrays can expand or shrink as needed.

• Does the array contain information about its index interval? Arrays of C or C++ do not contain any structure information, they are barely a pointer to the first element of the array. In these languages, it is the responsibility of the developer to make the required information available if needed. In Pascal-like languages arrays usually "know" their size and index interval. Depending on compilation options, the selectors might do range checks at runtime to ensure that the specified index is in the permissible interval. Many languages - for example Ada, Java or CLU - offer some means for querying the index boundaries.

• Does the language support assignment of arrays? Often the answer is no. C-like languages define an assignment operator for array types but as arrays are represented by the pointer of their first element, this does not mean copying the array, but rather sharing the reference. Pascal-like languages usually allow assignment, but only for named array types. Most languages that use structural type equivalence allow assignment between arbitrary array types as long as the number of elements is the same.

• Can we use unconstrained arrays as a formal parameter of subprograms? In languages which have unconstrained array type - fe.g. Ada - it can be used as the type of formal parameters of subprograms. Using these types makes the subprogram more generic. We can have similar results in other languages as well. For example in C or C++, arrays are simple pointers to the first element of the array, which also enables creating subprograms that work on arrays of arbitrary length.

• Does the language support multi-dimensional arrays? Many languages support one dimensional arrays - vectors - only. However, most of them allow using vectors as element types of vectors. Other languages support real multi-dimensional arrays, though they might impose limitations on the number of dimensions (e.g. FORTRAN).

• Does the language support the set composite type? Set is a much more infrequent composite type than arrays, cartesian products or even unions, though some languages like Pascal or Modula-2 support it. Often sets are implemented as a standard container data type in some standard library of the language.

• What are the restrictions of the element type of sets? Languages that support the set composite type impose strict limitations on the element type. They usually require discrete or enumeration types and often limit the size of the type-value set as well.

• Does the language have other composite types? Some languages support other type composition methods. We have seen examples like the lists and hashtables of Perl. Other languages may also have constructs that do not fit in with the archetypical composites described in this chapter. When encountering a new composite, we should try to determine its type-value set and the set of its operation as seen in the previous sections.

However, we have to pay close attention to the type composition methods of the language which need to be distinguished from the container data structures that are usually provided as part of the standard libraries of the language.

7.10. 6.10 Exercises

Exercise 6.1. What are the advantages and disadvantages of immutable types?

Exercise 6.2. What does structural type equivalence mean?

Exercise 6.3. What are the advantages of unconstrained array types?

Exercise 6.4. Specify the formulas to calculate the address of elements of quadratic matrices when using row-major, column-major or spiralic{Spiralic order is highly impractical and it is not used in any programming language. The goal of this exercise is to demonstrate that any address function could be used.} order of the elements.

Exercise 6.5. Explain the challenges of the complement operation in the case of set types!

Exercise 6.6. Create a quadratic matrix type in Ada. The size of the matrix should be the parameter of the type.

How can you guarantee that the matrix is quadratic?

7.11. 6.11 Useful tips

Tip 6.1. Consider thread safety and the impact on memory usage.

Tip 6.2. This is the weakest form of type equivalence.

Tip 6.3. Consider what kind of generalizations the use of unconstrained arrays enable!

Tip 6.4. Consider the addressing of multidimensional arrays:

Tip 6.5. Consider the complement set of or !

Tip 6.6. The index boundaries of unconstrained two-dimensional arrays can be specified independently. The array should be wrapped in a discriminated record which has a single discriminant, the size of the quadratic matrix. The actual array is then defined using the discriminant. The drawback of this solution is that getter and setter subprograms need to be created for accessing and setting the elements of the matrix.

7.12. 6.12 Solutions

Solution 6.1. The biggest advantage of immutable types is that they are safe for concurrent or even parallel use.

As the internal state of the object cannot change after initialization, there is no risk of conflicting concurrent modifications. The drawback of the use of immutable types is that as each modification will create a new copy of the complete object. This adds to the memory footprint of the application and to the execution time as well.

There are many possibilities to optimize this copying process but still the overhead can be prohibitively large for certain complex types.

Solution 6.2. Structure equivalence is the weakest form of type equivalence, which considers two types equivalent when their structure is isomorphic, regardless of their names. Such equivalence is used for example in Modula-3. Even structural equivalence can have different levels, e.g. whether the name of fields in a cartesian product type are considered part of the structure. In the case of Modula-3 the following definition is used:

Two types are the same if their definitions become the same when expanded; that is, when all constant expressions are replaced by their values and all type names are replaced by their definitions.{Source: Modula-3 language definitionhttp://www.cs.purdue.edu/homes/hosking/m3/reference/}

According to this definition, the following types are equivalent:

TYPE Coordinates = RECORD X : INTEGER;

Y : INTEGER;

END;

TYPE RationalNumber = RECORD N : INTEGER;

D : INTEGER;

END;

Solution 6.3.

Unconstrained array types allow the creation of more flexibly usable subprograms. Sorting, finding or aggregating the elements of the array and other often used algorithms can be implemented in a more generic fashion.

The following Ada generic implements conditional maximum search, that is it select the maximum element of an array which satisfies a given condition (e.g. find the greatest odd value). The parameter of the generic is an arbitrary array type. In the generic the array type is treated as unconstrained. Notice the use of array attributes!

generic

type Element is limited private;

type Index is (<>);

type Vector is array(Index range <>) of Element;

with function Cond(E: Element) return Boolean;

with function "<"(A,B: Element) return Boolean is <>;

procedure CondMax(V: in Vector; FOUND: out Boolean; I: out Index);

procedure CondMax(V: in Vector; FOUND: out Boolean; I: out Index) is begin

FOUND:=False;

I:=V'First;

for J in V'Range loop if Cond(V(J)) then

if (not FOUND) or else (FOUND and then V(I)<V(J)) then I:=J;

FOUND:=True;

end if;

end loop;

end CondMax;

Solution 6.4.

Let denote the size of the quadratic matrix and denote the base address of matrix . For simplicity, let us assume that the matrix is zero-indexes (i.e. both row and column indices are from the interval). The address function of the matrix

• Row-major order: >

• Column-major order: >

• Spiralic order (assuming clockwise order starting from : where

where and where

and where and

where and

where and and

Solution 6.5. The biggest problem with complement operation is that the resulting set is often infinite. Consider for example a set of strings. There are infinite possible strings but even if the length of possible elements is limited, the size of the complementer set is prohibitively large. For this reason complement operation is not supported for set types except for languages where the supported base type is highly restricted (for example in Pascal it must be discrete type with maximum elements).

Solution 6.6.

Package QuadraticMatrix is

-- QMatrix is defined as discriminated record to ensure that the matrix is -- quadratic

Type QMatrix(N:Positive) is private;

-- Unfortunately we need to create getters and setters for the matrix -- elements.

Procedure Set(M: in out QMatrix; I,J: Natural, VALUE: Float);

Function Get(M: QMatrix; I,J: Natural) return Float;

Function "+"(A,B: QMatrix) return QMatrix;

Function "-"(A,B: QMatrix) return QMatrix;

Function "*"(A,B: QMatrix) return QMatrix;

-- Multiplication by a constant. F*M and M*F need to be declared separately Function "*"(F: Float; M: QMatrix) return QMatrix;

Function "*"(M: QMatrix; F: Float) return QMatrix;

Private

Type Matrix is array(Integer range <>, Integer range <>) of Float;

Type QMatrix(N: Positive) is record

M : Matrix(0..N-1, 0..N-1) := (others=>(others=>0.0));

End record;

End QuadraticMatrix;

Package body QuadraticMatrix is

Procedure Set(M: in out QMatrix; I,J: Natural; VALUE: Float) is Begin

M.M(I,J) := VALUE;

End Set;

Function Get(M: QMatrix; I,J: Natural) return Float is Begin

Return M.M(I,J);

End Get;

Function "+"(A,B: QMatrix) return QMatrix is R : QMatrix(A.N);

Function "-"(A,B: QMatrix) return QMatrix is Begin

Return A + (-1.0*B);

End "-";

Function "*"(A,B: QMatrix) return QMatrix is R: QMatrix(A.N);

Function "*"(F: Float; M: QMatrix) return QMatrix is R : QMatrix(M.N);

Begin

For I in R.M'Range(1) loop For J in R.M'Range(2) loop R.M(I,J) := F * M.M(I,J);

End loop;

Return R;

End "*";

Function "*"(M: QMatrix; F: Float) return QMatrix is Begin

Return F*M;

End "*";

End QuadraticMatrix;

8. 7 Subprograms (Tamás Kozsik, Attila Kispitye, Judit Nyéky-Gaizler)

In this chapter we discuss the subprograms - the programming language features for implementing control abstraction. We will learn why these subprograms are useful? What differences are between procedures and functions? The related notions such as the specification, definition and calling of subprograms will be described.

This chapter also reviews the parameters for subprograms, and which techniques can be applied to pass them.

Recursive subprograms will also be dealt with, like how much they differ from macros, co-routines and iterators. We examine how subprograms fit the frame defined by the structure of the program, and how they can be nested and what scope rules apply to them. At the end the most important language elements for subprogram implementations will be shown.

The most important task of the programmers is to design quality software. Data abstraction skills help finding good solutions, but for program language implementations also proper tools are needed. Subprograms offer support for control abstraction possibilities. "Subprograms have already existed before the first programming languages" - writes Ravi Sethi in the introduction to his chapter about subprograms [Sethi, 1996], and explains:

In document Advanced Programming Languages (Pldal 190-200)