• Nem Talált Eredményt

58 CHAPTER 5. SEARCH PLAN DRIVEN GRAPH PATTERN MATCHING

Figure 5.7: Runtime results of the Sierpinski casesstudy

In overall, based on this case study and all of our other experiences [27,17, 2] the local search based pattern matcher is capable of handling problems within the few hundred thousand model elements with acceptable performance.

5.7. SUMMARY 59

greedy algorithms (see in Section 5.3.2) can only provide low cost, but not necessarily optimal search plans. From a mathematical point of view, it is easy to find counterexamples for the optimality of the presented algorithms (e.g., complete graph with only one type). Additionally, the weights for the different constraint do not reflect precisely the cost of the operations it would require as it is based on the metamodel and not on the actual instance model.

However, we believe that these problems are rarely occur in practical application domains as the implemented approach did quite well in many graph transformation tool contests [27,2,21].

Chapter

6

-Hybrid Graph Pattern Matching

Practical experience has shown that optimizing model transformations is an important part of ap-plying model-driven techniques for system development. First, as models are increasing in size and complexity, transformations need to be able to transform them efficiently. Secondly, as transforma-tions are becoming hidden (e.g. embedded in a design tool), they should execute seamlessly - quickly and using as little resources as possible.

As a result of these experiences we found that the incremental pattern matching approaches may lead to orders-of-magnitude increases in speed. However, an important implication of caching match sets is increased memory consumption, which needs to be taken into account when scaling up to large models. Unfortunately, in many practical applications of model transformations, available memory is frequently constrained (e.g. when they are executed on average desktop computers and not on high performance servers)

We believe that many transformations could benefit even more from combining these two ap-proaches to use the most suitable pattern matcher engine for each graph patterns. We propose a hybrid pattern matching approach which enables the transformation designer to combine local search-based and incremental pattern matching to adapt to memory constraints. At design-time, transformation engineers may select whether a graph pattern should be matched using the LS or the INC strategy separately for each pattern. Moreover, based upon runtime monitoring, the execution engine may automatically switch from incremental pattern matching to local-search based technique when a certain memory limit has been reached.

The structure of the current chapter is the following, Section 6.1 briefly introduces incremental pattern matching, Section 6.2 outlines how the hybrid engine based upon the adorned search graph representation. Then Section 6.3 describes the runtime results of the AntWorld case study on the different pattern matching strategies. Finally, Section 6.4 defines metrics based on our earlier experi-ences how to select between the different strategies and additionally, we present an adaptive runtime technique to switch to LS strategy in case of low memory.

61

62 CHAPTER 6. HYBRID GRAPH PATTERN MATCHING

6.1 Incremental Graph Pattern Matching

Incremental pattern matching[VVS06b, BOR+08] offers a different execution model compared to local search-based implementations. The match sets for all patterns involved in the graph pattern are computed in an initialization phase prior to execution (e.g. when the model itself is loaded into memory), and as the transformation progresses, this match set cache is incrementally updated as the model graph changes (update phases). Thus, model search phases are reduced to fast read-from-cache operations, in exchange for the overhead imposed by cache update phases which occur synchronously with model manipulation operations. There are several incremental pattern matching approaches [GJR10, LS05, VVS06b, BOR+08, MMS07] available for graph pattern matching, in the current section, we give a short overview on our approach based on RETE nets [For82] as realized in the Viatra2 framework.

Our current introduction is based on [BOR+08].

6.1.1 RETE-based Incremental Graph Pattern Matching

RETE-based pattern matching relies on a network of nodes storingpartial matchesof a graph pattern.

A partial match enumerates thosetuplesof model elements which satisfy a subset of the constraints described by the graph pattern. In a relational database analogy, each node stores aview. Matches of a pattern are readily available at any time, and they will be incrementally updated whenever model changes occur.

Information is represented by a tupleconsisting of model instance elements. Each nodein the RETE net is associated with a (partial) pattern and stores the set of tuples that conform to the pattern.

This set of tuples is in strong analogy with the relation concept of relation algebra.

Within the RETE net we define the following type of nodes:

• Input nodes that serve as the underlying knowledge base representing the instance model model. There is a separate input node for each entity type (class), containing a view repre-senting all the instances that conform to the type. Similarly, there is an input node for each relation type, containing a view consisting of tuples with source and target in addition to the identifier of the edge instance.

Additionally, when a pattern calls another pattern, it can simply use the appropriate production node of the called pattern to obtain the set of tuples conforming to the other pattern. This special input node is calledpattern input node

• Intermediate nodes store partial matches of patterns, or in other terms, matches of partial graph patterns. In our approach we use the following type:

– The most widely used node type is thejoin node. It is created as the child of two parent nodes, that each have an outgoing RETE edge leading to the newly created join node. Its role can be best explained with the relational algebra analogy: it performs a natural join on the relations represented by its parent nodes.

– Negative node(or antijoin node) has two distinctive parents: primary and secondary in-puts, respectively. The negative node contains the set of tuples that are also contained by the primary input, but do not match any tuple from the secondary input, which is analogous to the antijoins in relational algebra.

– Term evaluation nodesimply represents the attribute condition defined in a graph pattern.

It propagates only those tuples that pass the given attribute condition.

6.1. INCREMENTAL GRAPH PATTERN MATCHING 63

• Finally,production nodesrepresent the complete pattern itself. Production nodes also perform supplementary tasks such as filtering those elements of the tuples that do not correspond to symbolic parameters of the pattern (in analogy with the projection operation of relational algebra) in order to provide a more efficient storage of models. Additionally, in case of multiple pattern bodies the production node is responsible for filtering out duplicate matches.

It is important to mention that different pattern bodies of a graph pattern are matched using separate matchers (RETE nets) for each body, while sharing the production node, which will perform a true union operation on the sets of the tuples conforming to each pattern body.

Example 18 As an illustration, Figure 6.1 shows a simplified RETE network matcher built for the missingCircleLink(see Figure 5.4(b)) graph pattern.

All nodes share a similar graphical representation, where their lower part represent the pattern variables of their tuple and their name. For example, the CirclePathnode has the FSRC, CP and theFT RGpattern variables in its tuple and is an input node which produces allCirclePathtyped link from the instance model, where, based on the metamodel we know that its source target object will be an instance ofField. Additionally, for easier readability each node highlights the part of the original graph pattern that is taken into account by the RETE net up to the output of the node. It means that the output tuples of the node fulfil the highlighted part of the original graph pattern. For example, the second join would produce all matches of the complete pattern without the negative application condition check defined for thecicrlePathedge. Finally, in case of join and anti join nodes the pattern variables used for the join (or anti join) are illustrated in the top blue box. For example, in case of the anti join theF1andF2pattern variables are anti joined with theFSRC andFT RGpattern variables, respectively.

The RETE net can be separated into three parts. The first upper part consists of the three input nodes. TheVerticalReturnPathand theCirclePathrepresent edge inputs – of corresponding types – while thecirclednode is a pattern input node for the circled graph pattern matched by an other RETE net. The center part defines the two join and one anti-join nodes that filter out the matching candi-dates for the pattern by applying the connectivity and negative application condition constraints on the pattern variables. First it matches the complete pattern and then uses the anti join to filter out those tuples that satisfies thecirclednac pattern. Finally, the lower part consisting of the production node that is used to filter out the output pattern variables as defined in the head of the graph pattern in our case theField1andField2pattern variables and execute the required injectivity checks on the complete matching.

Note that the first pattern body of the missingCircleLinkgraph pattern is also depicted (on the right side) in the figure to ease the understanding of the RETE net.

6.1.2 Updates After Model Changes.

Model changes are propagated through the network, modifying the match sets stored at the nodes incrementally, since each node only recomputes a partial matching. Thus, the pattern matcher is capable of incrementally tracking the match set of a complex pattern by decomposing the pattern into constraints, constructing a RETE network based on that decomposition, and updating it as models change.

Input nodes receive notifications about each elementary model change (e.g. when a new model element is created or deleted) and release an update token on each of their outgoing edges. Such an update token represents changes in the partial matches stored by the RETE node. Positive update tokens reflect newly added tuples, and negative updates indicate tuples being removed from the set.

64 CHAPTER 6. HYBRID GRAPH PATTERN MATCHING

Figure 6.1: Sample RETE net of the first pattern body of themissingCircleLinkgraph pattern