Model-specific search graphs and plans - Generating model-specific search plans

7.3 Generating model-specific search plans

7.3.1 Model-specific search graphs and plans

In the first phase of the search plan generation process, a search graph is created for theLHSandNAC patterns of each rule in the same way as described in Sec. 4.2.1.

Example 22 Graph transformation ruleAssocEndRuleand its corresponding search graph are depicted in Figures 7.2(a) and 7.2(b), respectively.

AE:AssocEnd

(b) Search graph forAssocEndRule

Figure 7.2: A sample graph transformation rule and its corresponding search graph

At this point, the transformation designer selects typical models from the problem domain, e.g., typical UML class diagrams and corresponding database schemas in our case. Node and edge statistics of these typical models are available, so weights can be defined for the edges of the search graph based on the statistical data collected from a model.

Aweighted search graphis a search graph with numeric weights on its edges. (Weights are depicted as labels of edges in Figures 7.3(c) and 7.3(d).) Informally, the weight of an edge can be considered as an average branching factor of a possible search space tree at the level, when the given pattern edge is selected for navigation. Such a choice for edge weights provides an easy to calculate cost function that estimates the size of the search space.

Weight calculation rules can be summarized as follows.

• The weight of an iteration edge corresponds to the number of objects that conform to the type of the pattern node that is represented by the target node derivative of the iteration edge. This value is given by the object counter declared for the type of the target node derivative.

• In case of a navigation edge, first, the number of such links has to be determined that conform to all type restrictions prescribed by the pattern edge (i.e., constraints on the type of the link,

108 CHAPTER 7. ADAPTIVE GRAPH TRANSFORMATION

the source object and the target object) that represents the navigation edge in turn. This value is shown by a corresponding link counter, which is restricted by the type constraints of the pattern edge. Since an average branching factor is aimed to be calculated for the weight, the value of the link counter has to be divided by the number of objects that conform to the type of the source node of the pattern edge, which is given by the object counter declared for this type.

Since the dynamic positioning ofNACchecking operations requires more sophisticated techniques (as pointed out by [64]), these operations are currently handled in a static way by always appending them to the end of search plans. In this sense, they are also excluded from the model-sensitive search plan generation process, which is reflected by the missing weights onNACcheck edges in the weighted search graphs. In the future, complex search plan operations (such asNACchecking and the invoca-tion of recursive patterns [152]) are also aimed to be handled adaptively, and they are planned to be integrated into the general framework of [64].

Example 23 Two models and their corresponding weighted search graphs are depicted in Fig. 7.3.

p:Package

(d) Weighted search graph of theLHSpattern of GT rule AssocEndRulebased on the statistics ofModel2

Figure 7.3: Sample instance models and corresponding weighted search graphs

The weight calculation rule is demonstrated on the navigation edge of Fig. 7.3(c) connecting free nodeC toAE (denoted temporarily by a dashed line), which corresponds to the reverse traversal of pattern edgesftof typeSFT(i.e., from pattern nodeCto pattern nodeAE).

According to our statistics,Model1contains 4 classes including classesc1andc2, associationa12, and tablet3. (Note that tables and associations should also be considered as classes according to the cor-responding metamodel of Fig. 2.1.) Additionally,Model1has 2SFTlinks between association ends and classes. As a consequence, if the pattern matching engine matches aClassto the pattern nodeCat some

7.3. GENERATING MODEL-SPECIFIC SEARCH PLANS 109

time during the execution, then the probability to find a validAssocEndfor pattern nodeAEby navigat-ing along anSFTedge is 0.5 derived by the formula#(SFT,AssocEnd,Class) /#(Class). In case of navi-gation in the opposite direction, the formula can be expressed as#(SFT,AssocEnd,Class) /#(AssocEnd), thus the corresponding weight is 1.

Definition 56 Given a metamodelM Mand a graph transformation ruler, theweighted search graph SG^M of the LHS pattern based on the statistics of modelM is the search graph SG of the LHS pattern together with a model-dependent weight functionw_M : ESG →R⁺that assigns non-negative numbers to the edges of the search graph according to the following rules.

• The weight of an iteration edged→ⁱ xconnecting the dummy nodedto pattern node derivative xis the value of the object counter that has been declared for the direct typet(b(x))of the origin pattern nodeb(x). Formally,∀d→ⁱ x∈ESG : wM(d→ⁱ x) =#_t(b(x))(M).

• The weight of a navigation edgeu →^z vconnecting pattern node derivative u to pattern node derivativevis calculated as follows. The value of the link counter#(t(b(z)),t(b(u)),t(b(v)))declared for the typet(b(z))of the pattern edgeb(z)restricted by direct typest(b(u))andt(b(v))of source and target pattern nodesb(u)andb(v), respectively, is divided by the value of the object counter

#_t(b(u))declared for the direct typet(b(u))of the source pattern nodeb(u). Formally,

∀u→^z v∈ESG : wM(u→^z v) = ^#(t(b(z)),t(b(u)),t(b(v)))(M)

#_t(b(u))(M) .

• The weight ofNACcheck edges is irrelevant.

As mentioned in the overview of search plan driven pattern matching in Sec. 4.2, in case of com-piled graph transformation approaches, search plans are prepared for all binding combinations and pattern matching code fragments are generated and compiled for all search plans at compilation time.

The current chapter also uses this schedule for search plan driven pattern matching. In this sense, as the following step in the search plan generation process, weighted search graphs get adorned for all necessary binding combinations at compile-time. For presentation purposes, our current investigation is restricted to the single adornment, in which all pattern node derivatives are free nodes.

At this point, an adorned weighted search graph is available for each typical model selected by the domain engineer. A cost function is now defined for search plans to predict the performance of the pattern matching strategy driven by them.

Thecost of a search plan(denoted byc(SP)) is an estimation for the number of nodes in the search space tree (SST), which would be generated during the execution of the pattern matching strategy defined by the search plan. The total number of nodes can be calculated by summing the nodes of the SST on a level-by-level basis. The number of nodes on theith level of the SST is the product of branching factors of such search forest edges whose target node is labelled by at mostiaccording to the search plan.

As weights denote branching factors, the minimization of a search plan with such a cost function results in a SST that is expected to be small. Moreover, such a search plan fulfills the first-fail principle criteria as it represents a SST that is narrow at the levels near to its root.

Example 24 Sample search plans on adorned, weighted search graphs are depicted in Fig. 7.4.

Both weighted search graphs are adorned by marking all their pattern node derivatives as free nodes, thus, only the dummy node is surrounded by the dashed box showing the bound part of the weighted search graph.

110 CHAPTER 7. ADAPTIVE GRAPH TRANSFORMATION

(a) Search plan defined for the adorned, weighted search graph of Fig. 7.3(c) whose statistics is based onModel1

(b) Search plan defined for the adorned, weighted search graph of Fig. 7.3(d) whose statistics is based onModel2

Figure 7.4: Sample search plans on adorned, weighted search graphs

Cost calculation is illustrated for the search plan of Fig. 7.4(a), which uses the statistics based on

Model1(Fig. 7.3(a)). This search plan binds tableTcfirst. As shown by the weight on the search forest edge leading to free nodeTcwith search plan label 1, a single object of type Tableis expected to be found inModel1. As a consequence, the SST has one node on its first level. Since weights of search forest edges with labels 2 and 3 are also 1, the SST is expected to have1·1and 1·1·1 node on its second and third level. Then the SST probably fork in two directions at level 4 as shown by the corresponding weight, thus, the number of nodes on this level is1·1·1·2. By following the same procedure for the remaining search forest edges, and finally, by summing the SST nodes being found on different levels, the grand total number of nodes is resulted. On this specific search plan, the cost is c(SP) = 1 + 1·1 + 1·1·1 + 1·1·1·2 + 1·1·1·2·0.5 + 1·1·1·2·0.5·1 + 1·1·1·2·0.5·1·1 = 8.

The cost of the search plan of Fig. 7.4(b) whose statistics is based onModel2of Fig. 7.3(b) is 12.

Note that the search plans of Fig. 7.4 are the ones that would be selected by the algorithms of Sec. 7.3.2, but their optimality even for their corresponding models cannot be generally proven. How-ever, in specific cases like the search plan of Fig. 7.4(a) prepared forModel1the optimality can be easily demonstrated.

• If the search plan started with node C, then the first and the second term in the sum would be 4 and 2, respectively. If the algorithm chose an edge with weight 1 at this point, the first three terms would already give 8. As a consequence, only the other edge with weight 0.5 could be selected at the third choice point. In this case, the sum of the first three terms is already 7, and there are four more edges with weight 1 to be included in the search plan, which exceeds 8.

• If the search plan started with nodeAE, then the edge fromCtoTcwith weight 0.5 should be included as soon as possible to decrease the term to be added to 1. This can happen in the third round at earliest. Even in this case, the cost is already 5, and there are four more edges with weight 1.

• If the search plan started with nodesRelorTrel, then node Tccan only be reached via AEand

C, which adds 5 to the cost at some point. Since the first edge has weight 1, and there are three additional similarly weighted edges, the cost of such search plans would be 9.

• The only remaining case is when the search plan starts with nodes on the left side (i.e., Tc,Pc,

Cc), which yields a number of equivalent search plans whose cost is 8.

Definition 57 Given an adorned, weighted search graphASG^M of theLHSpattern based on the statis-tics of modelM together with a corresponding search planSP, thecost of search planSP(denoted

7.3. GENERATING MODEL-SPECIFIC SEARCH PLANS 111

byc(SP)) is a non-negative number, which predicts the performance of the pattern matching strategy that is driven by search planSP. The cost of search planSPis calculated as

c(SP) =

|V_SG^F| X

i=1 i

j=1

wherew_j denotes the weightw_M(u→^z v)of search forest edgeu→^z v∈ESF, which leads to free node vlabelled byjaccording to search planSP(i.e.,SP(v) =j). The terms are summed for all free nodes V_SG^F of adorned search graphASG^M.

In document professorBudapest,April2008 Prof.Dr.rer.nat.AndySchürr assistantprofessor Dr.DánielVarró,PhD associateprofessor Dr.KatalinFriedl,PhD MScinTechnicalInformaticsSupervisors: GergelyVarró PhDThesis AdvancedTechniquesfortheImplementationofModelTransformationSy (Pldal 119-123)