• Nem Talált Eredményt

48 CHAPTER 5. SEARCH PLAN DRIVEN GRAPH PATTERN MATCHING

To achieve good performance the assignment of a cost to each search plan is a widely accepted approach. If this cost is in strong correlation with the size of the search space tree, then the execution of the minimum-cost search plan yields the fastest algorithm variant for pattern matching. This fact calls the attention to the importance of search plan generation algorithms that aim at finding a minimum-cost search plan in a given search graph. Note that an overall speed-up in execution time is only acceptable if the execution of a better search plan can compensate the additional time spent on the generation of the plan.

5.3.1 Cost of Search Plans

The current section (based on [Var07]) highlights the most frequent measures being used for char-acterizing search plans, and their corresponding search plan generation algorithms. In order to have a uniform notation, letwkdenote the weight of thekth operation according to the order defined by the search plan. Let us further suppose that the search plan consists of n operations.

• Sum of weights. The first and most intuitive approach is to use the sum of weights of the edges that comprise the search plan, formally, wP(SP) = Pn

j=1wj. Its main advantage is that the Chu-Liu / Edmonds algorithm [CL65, Edm67] can quickly generate minimum-cost search forests as the algorithm has a time complexityΩ(ne), and search graphs have at most a few dozen nodes and edges in a typical application scenario.

However, the cost of a forest is completely insensible to the different orderings of its tree edges, thus, sum-based cost functions provide a poor estimate for the size of the search space tree, which means that even the minimum-cost search plan does not necessarily lead to a fast pattern matching process.

• Product of weights. The implementation of the GT tool being presented in [JBK10] uses the product of weights as a cost functionwP(SP) = Qn

j=1wj. However, this is quite similar to the above mentioned approach as by taking the logarithm of the cost, we get the sum of the logarithms of weights. This cost function gives a better estimate for the size of the search space tree, but it is still highly insensible to the different orderings of the search forest edges. Similar algorithms can be used to calculate the search plan as mentioned before.

• Sum of Products In [VVF05, Var08] Gergely Varró proposed to calculate the cost function as wsum(SP) =Pn

i=1

Qi

j=1wj, which is a correct estimation for the size of the search space tree, if weights of search graph edges denote branching factors, which are collected from the actual model on which graph transformation is performed. The main drawback of this technique is the lack of algorithms, which could find a minimum-cost search plan according to this special cost function. Varró proposed to use again the Chu / Edmonds algorithm and argued that in practical scenarios it provides acceptably low cost for the computed search plans. However, it is important to mention that in [VDWS12] a dynamic programming based algorithm is proposed that in certain scenarios theoretically is capable of calculating the optimal solution for his own cost function.

For theweightof the variousoperationswe use a simple metamodel based approach. Weighting the simple operation follows the guidelines of edge multiplicity based cost functions (e.g., if an edge multiplicity is one-to-many, then its cost is higher then if it is one-to-one) with the following restric-tion: the lowest cost is assigned to theBB adornments (checktype operation), and there is no

dif-5.3. SEARCH PLAN GENERATION 49

ference between the cost ofF Band theBF (extend type operation). Among the constraints we use the cost ordering based on our earlier transformation experiments [27] and [17]:trg=src < inst

In case of complex constraints, assigning costs to operations is easier on the one hand as they have only one permitted adornmentB . . . B, but on the other hand better cost prediction is possible using a priori knowledge. In case ofinjand attr constraints the number of input parameters provides a good prediction for complexity, while in case of anacconstraint the whole NAC pattern graph matching cost can be evaluated at compile time. The cost functions are the following: (i) forinjand attroperations the cost function is linear in the number of parameters and (ii) fornacoperation the cost function is proportional to the number of constraints in the search graph of the NAC pattern.

The idea behind this selection is that a NAC check may cut the search space significantly when its pattern is small.

In our realization of the adorned search graph driven pattern matching (implemented in Via-tra2 and described in Section 5.4) we opted for the Sum of Products function as it provides a good estimation on the runtime characteristics of the generated search plans.

Example 14 In our running example, we defined explicit operation cost values (see in Figure 5.2(a)) for all operations of the example adorned search graph. The concrete number values are only used to illustrate the order of magnitude of operation costs. For example, theinstconstraint for thePpattern variable either has the cost of one in case of two bound pattern variables as it is only a simple type check operation, while if the edge hasPas a free pattern variable than it is an extend type operation wherePwill be mapped to all pathtyped link of the instance model. The complete cost using the Sum of Products function is calculated as follows:

wsum(SP)=1 + 1·5 + 1·5·1 + 1·5·1·5 + 1·5·1·5·1+ 1·5·1·5·1·5 + 1·5·1·5·1·5·1+

1·5·1·5·1·5·1·3 + 1·5·1·5·1·5·1·3·5 + 1·5·1·5·1·5·1·3·5·1+ 1·5·1·5·1·5·1·3·5·1·3+

1·5·1·5·1·5·1·3·5·1·3·4 = 32625.

5.3.2 Algorithm for Finding a Low Cost Search Plan

For generating the actual search plans, we applied a slightly modified version of the Chu/Edmonds algorithm as described in [VVF05]. Two traditional greedy algorithms are used to solve the problems of finding (i) a low cost search tree for a given adorned weighted search graph and (ii) a low cost search plan for a given search tree.

Finding a minimum search tree.

Forfinding a minimum search tree in a weighted adorned search graph, the Chu-Liu / Edmonds algo-rithm [CL65, Edm67] is used. This algoalgo-rithm searches for a spanning tree in a directed graph that has the smallest cost according to a cost function defined as the sum of weights.

To take into account the practical side we adopted the algorithm to our hypergraph representa-tion using the following considerarepresenta-tions:

• Simple edges are modeled using two references between the nodes, thus if any operation is executed on a reference its opposite reference also automatically receives the same operation.

• Hyperedges are modeled as a node which has specific references to the nodes of the hyperedge.

In any case one of these specific references are selected for any operation during the search plan generation the algorithm automatically executes the operation on all of the references of the hyperedge.

50 CHAPTER 5. SEARCH PLAN DRIVEN GRAPH PATTERN MATCHING

• The circle detection algorithm has to be slightly modified to be able to handle our hyperedge representation. Any time a hyperedge is selected all of its nodes are added to the circle candi-date set and the search continues on all these nodes.

• The weight values of the edges are based on the following simple rules (i) edges between bound input values or constants are considered with their all bounded weight and (ii) all other cases are calculated as F BorBF, which for this reason has the same weight values for all simple operations.

• The starting node is either a bound input pattern variable or a constant.

Using these considerations the algorithm is outlined in Algorithm 5.1.

Algorithm 5.1The Used Variant of the Chu-Liu / Edmonds Algorithm The input is an adorned weighted search graph with a selected starting node.

Step 1: Discard the edges entering the starting, bound nodes and constant nodes.

Step 2: For each free node, select an incoming edge with the smallest weight. Let the selectedn−1 edges be the setS.

Step 3: If there are no cycles formed by the edges ofS, then the selected edges constitute a minimum spanning tree of the graph and the algorithm terminates. Otherwise the algorithm continues.

Step 4: For each cycle formed, contract the nodes in the cycle into a pseudo-nodek, and modify the weight of each edge entering nodejin the cycle from some nodeioutside the cycle according to the following equation.

w(i, k) =w(i, j)−[w(x(j), j)−minl{w(x(l), l)}]

wherew(x(j), j)is the weight of the edge in the cycle which entersj.

Step 5: For each pseudo-node, select the entering edge, which has the smallest modified weight.

Replace the edge, which enters the same real node inSby the new selected edge.

Step 6: Go to step 3 with the contracted graph.

Finding a Low Cost Search Plan

In case offinding a low cost search plan in a given search tree, a simple greedy algorithm is used, which is introduced in Algorithm 5.2. It is important to mention that the value of the edges are the values before the Chu-Liu / Edmonds algorithm was used.

It is important to note that the algorithm grows the spanning tree starting from either a bound or constant node thus we can be sure that in all cases at least one node will be bound for each operation when it is selected as an earlier operation has already provided a mapping candidate to its representing pattern variable. This hinders the need forF F adorned simple operations.

5.4 Realization of an Adorned Search Graph Driven Pattern Matcher in