Parallelization of sequential programs - fejezet

Animation 5.1: The dynamics of a non arbitrarily often firable, but potentially firable Petri net

K. Jensen: Coloured Petri Nets and the Invariant-Method, Theoretical Computer Science 14 (1981), 317-336

6. fejezet - Parallel programs

6.2. Parallelization of sequential programs

In case of sequential algorithms it is common, that to get the final result we have to determine intermediate results that are not based on each other. If, for example, we are looking for the divisors of the positive integer n, one possible way is to check each number from 2 to n whether it divides n. Of course these trials can be done separately. Thus the algorithm can be divided automatically to independent threads. These threads do not need any communication, do not wait for each other and give different intermediate results of the problem. In this chapter we will see how to resolve an algorithm to steps that can be executed in a parallel, after a syntactic analysis.

In this chapter we will see how to decompose an algorithm into steps that can be executed parallelly, after its analysis due to its (syntactic and semantic) description.

During our observations we will be still using the command based (imperative) programming languages. To make it more understandable we will examine a language which is quite general, but as simple as it can be. In this language there are only 3 + 1 kind of commands:

0. skip 1. X ← E

2. if B then C1 else C2 endif 3. while B do C endwhile

0: does not do anything; it will have an important role at 2, when the conditional command is not complete.

1: the usual assignment command; after its execution the value of X will be equal to the value of the expression E. The expression E can be any well defined operation resulting an object from the domain of X. From practical point of view, of course, only computable operations are important.

2: conditional execution; if the logical expression B is true, C1 will be executed, in any other cases C2; if the command is incomplete, for example, it does not have an else branch, we will substitute skip to C2. The properties of B are similar to the ones of E with difference, that the result is a Boolean value.

3: conditional loop; while the logical expression B is ture, we execute the loop body C.

We will examine the possibilities of the parallelizing of sequential programs on this very simple language. For more complex analysis, the programming language can be extended by the necessary instructions.

We assume that the variables have no type, i.e. they have a universal type.

6.2.6.2.1. Representation of algorithms with graphs

In the introduction of the chapter we saw a frequently used way to represent an algorithm, the so called pseudo code. With this we can describe the procedures in the structure used for the imperative languages. For those who are experienced programmers that is a tool expressive enough to describe the operations and their order. The internal relationships of the algorithms given by a pseudo code are only visible after a detailed analysis (in case if we can find them at all).

One of the most important aims of the representation by graph is that we try to give the above mentioned relationships during the description of the algorithm in a visually understandable way.

According to the first generation of the programming languages, in practice, the representation of algorithms by flow charts was very usual.

6.2.6.2.1.6.2.1.1. Flow charts

The flow chart is basically a directed graph, its vertices represent the instructions, its edges the sequentialness.

In case of simple algorithms the interdependence of the instruction execution can be represented very expressively with it, but, because of the spreading of the structural programming languages its usability has been reduced, as some kinds of flow chart structured are very difficult to be converted into program structures. (Like jumping from one loop to another, for example. This can make a well-structured program completely broken, the controlling will become untraceable.)

On Figure 6.1 we can see the algorithm of the bubble sort.

6.1. ábra - The flow chart of bubble sort.

Though the algorithm is well interpretable by the flow chart, it is not completely straightforward, how to write a well-structured goto free C code of it.

It is also difficult to say what kind of parallelization steps can be used.

6.2.6.2.1.6.2.1.2. Parallel flow charts

We can generalize the flow chart model in the direction of parallelizing. Basically, the directed edges are expressing the sequentialness of the instructions here too, but we have to introduce the parametrical edge as well (that is belonging to the parallel loop). With this we can express that we are executing the instructions in the quantity of the parametrical edge.

On the following flow chart we can see the parallelized algorithm of the bubble sort.

6.2. ábra - The flow chart of parallel bubble sort; J is a set type variable.

From here we have one more generalizing step to the data flow graph. In this model the interdependence of the instructions is completed with the representation of the possible data flows.

6.2.6.2.1.6.2.1.3. Data flow graphs

In a data flow graph, as before, the instructions are in the vertices, which are called actors.

The edges of the graphs are representing not only simple sequential execution, but the real data flow possibility.

Basically we can consider them as data channels, thus the operations of the algorithm represented by the graph can be observed by graph theoretic tools. (The determination of the time complexity will be reduced to the searching of the longest path and the maximal flow capacity of the graph will get a meaning in case of the so called “pipeline” operation. In this case we can start the computation on the new input before getting the result of the precedent computation.)

We call the data packets flowing in the data flow graph tokens.

An actor will become active (allowed), if there are available tokens completing the conditions given on the inputs.

When a node of the graph become active, it starts working immediately, executes the dedicated operation and puts the result as a token on the output edge. On one output edge there can be several tokens, but their processing order has to be the same as their creation order. (So the edges can be considered a kind of FIFO data storage.)

We may notice that the data flow graphs are similar to the Petri nets, with one difference: the tokens here are particular data packets.

6.3. ábra - Input actor.

6.4. ábra - Output actor.

6.5. ábra - Adder actor.

6.6. ábra - Condition generator.

6.7. ábra - Port splitter; the output token of A appears at the input ports of B and C in the same time.

6.8. ábra - Collector actor; the output token of A or B appears on the input if it becomes empty.

6.9. ábra - Gate actor; if a signal arrives from B, then A is transferred from the input port to the output port.

We can give different conditions to each component of the data flow graph, thus we get different models.

In the simplest case we do not let feedback in the graph (directed acyclic). For example choosing the minimum can be seen in Figure 6.10. :

6.10. ábra - Selecting the minimum; X = min{xX

, xX

}.

In more complex models we allow feedback, but we may require for example that an actor only starts to work if there is no token on the output edges. (So the previous output has already been processed by the actor.) For example: bubble sort.

6.11. ábra - The data flow graph of bubble sort.

In one of the most advanced data controlled model the tokens can pile up at an actor. We resolve the synchronization by numbers assigned to the tokens. We will give the same number to the related input data. If the tokens with proper numbers have arrived to an actor, it will be activated and computes its output tokens, giving them the same number as the input has. When tokens with other numbers arrive, it puts them aside till all of them that are needed for the activation are arrived. Thus the result of the computation will not necessary appear in the same order we gave at the input.

6.2.6.2.1.6.2.1.4. Data dependency graphs

The typical observation method of graph representation is that we switch the roles of the vertices and edges. In our case it means that in the new graph model the edges will represent the instructions and the nodes represent the data that the instructions are applied on. As each instruction can execute an operation on several data at the same time, it is practical to choose the graph as a hypergraph. Thus we can represent the dependency relations between the data. Because of the traceability, we represent the different occurrences of the data differently.

There are several levels of the dependency of the data, we will present them later.

In document Parallel approach of algorithms (Pldal 143-147)