• Nem Talált Eredményt

Chapter

7

-Summary and Related Work in Graph Pattern Matching

7.1 Related Work

Pattern matching plays a key role in the efficient execution of all model transformation engines. In case of graph transformation based approaches, the goal is to find the occurrences of a graph pattern, which contains structural as well as type constraints on model elements. During pattern matching, each variable of a graph pattern is bound to a node in the model such that this matching (binding) is consistent with edge labels, and source and target nodes of the model.

Local Search based Approaches

Fujaba[NNZ00] performs local search starting from the node selected by the system designer and extending the matching step-by-step by neighbouring nodes and edges. Fujaba fixes a single, breadth-first traversal strategy at compile-time (i.e. when the pattern matching code is generated) for each rule. Fujaba uses simple rules of thumb for generating search plans. A sample rule is that navigation along an edge with an at most one multiplicity constraint precedes navigations along edges with arbitrary multiplicity.

PROGRES [SWZ99] uses a very sophisticated cost model for defining costs of basic operations (like enumeration of nodes of a type and navigation along edges). These costs are not domain-specific in the sense that they are based on assumptions about a typical problem domain on which the tool is intended to be used. Operation graphs of PROGRES, which are similar to search graphs in the current paper, additionally support the handling of path expressions and attribute conditions.

The compiled version of PROGRES generates search plan at compile-time by a greedy algorithm, which is based on the a priori costs of basic operations

77

78 CHAPTER 7. SUMMARY AND RELATED WORK IN GRAPH PATTERN MATCHING

The pattern matching engine ofGReAT[KASS03] uses a breadth-first traversal strategy starting from a set of nodes that are initially matched. This initial binding is referred to as pivoted pattern matching in GReAT terms. This tool uses the Strategy design pattern for the purpose of future extensions and not for supporting different pattern matching strategies like in our approach.

GrGen.NET[JBK10] is based on plan graphs – similar to search graphs – for capturing the con-straints defined by a graph pattern. From these plan graphs the search plans are generated using a simple heuristic optimization algorithm that minimize the most significant term occurring in its cost function defined as a product of the costs of the simple search operations. Their main advantages are that (i) their cost values are calculated from the underlying instance model and thus provides a precise prediction on the search space tree and (ii) the overhead for calculating the simple algo-rithm is very small. In overall the GrGen.NET framework is among the fastest model transformation frameworks when comparing pure graph pattern matching performance.

Varró et al. [VDWS12] propose a novel model-sensitive optimization approach based on dynamic programming. Its main advantage that it can handle arbitrary n-ary constraints and combined with the model-sensitive cost calculation for simple operations it can produce (at least theoretically) op-timal solutions for complex cost functions. In their implementation this feature is exemplified with his own cost function also used in our own approach.

Giese et al. [GHS09] introduces a novel greedy approach that does not generate sophisticated search plans but instead uses an interpreted, dynamic approach, which always selects the next refer-ence to traverse that contains the lowest number of target elements. One of its key advantage is that it always takes into account the underlying model when executing pattern matching and thus in gen-eral provides a good performance on various instance models. However, it is important to mention that the presented greedy approach may provide sub optimal pattern matching performance, where no bound input parameter is defined and thus the selection of the starting node is based only on their number in the underlying instance model. Additionally, some of the authors [BGST05] also provided an algorithm for calculating worst-case-execution-time for their pattern representation (called story diagrams) regardless of the applied local-search based pattern matching algorithm. This helps to ap-ply their approach in a hard-real time environments, which is an evolving area in the model@runtime community.

Constraint satisfaction based Algorithms that handle pattern matching as a constraint satisfac-tion problem (CSP) like [LV02] in AGG [ERT99] do not directly involve the concept of search plans.

However, the underlying constraint solver engine has to define a variable binding order, which can be considered as a search plan derived dynamically at run-time. As a consequence, CSP-based graph transformation engines by their nature support that dynamicity that has been achieved by our ap-proach for local search based algorithms. However, as constraint solver implementations typically use the first-fail principle for determining the variable binding order, this technique still schedules the attribute, injectivity and NAC checking operations to the earliest possible location.

Incremental Model Transformation Approaches

Varró et al. in [VVS06b] proposes a graph pattern matching technique, which constructs and stores a tree for partial matchings of a pattern, and incrementally updates it, when the model changes.

As a novelty, notification arrays are introduced for speeding up the identification of such partial matchings that should be incrementally modified. The main advantage of this solution is that only matchings, which appear as leaves of the tree, have to be physically stored, which possibly saves a significant amount of memory.

7.1. RELATED WORK 79

The model transformation tool TefKat includes an incremental transformation engine [HLR06]

that also achieves incremental pattern matching over the factbase-like model representation of the system. The algorithm constructs and preserves a Prolog-like resolution tree for patterns, which is incrementally maintained upon model changes and pattern (rule) changes as well.

Giese et al. [GW06] present a triple graph grammar (TGG) based model synchronization ap-proach, which incrementally updates reference (correspondence) nodes of TGG rules, based on no-tifications triggered by modified model elements. Their approach share similarities with our RETE based algorithm, in terms of notification observing, however, it does not provide support for explicit querying of (triple) graph patterns.

As a new effort for the EMF-based model transformation framework ATL [JT10], incremental transformation execution is supported, including a version of incremental pattern matching that incrementally re-evaluates OCL expressions whose dependencies have been affected by the changes.

The approach specifically focuses on transformations, and provides no specific incremental query interface as of now.

VMTS [MMLA10] uses an off-line optimization technique to define (partially) overlapping graph patterns that can share result sets (with caching) during transformation execution. Compared to our approach, it focuses on simple caching of matching result with a small overhead rather than complete caching of patterns.

Hybrid Approaches

Up to our knowledge, there has been no other graph transformation system that applied any similar hybrid approach for graph pattern matching. However, similar concepts have already been applied in neighbouring research domains:

Expert Systems As a conceptual analogy for our hybrid approach, research in expert systems [WM03] demonstrated that an integration between two different incremental strategies can be ad-vantageous with respect to memory consumption and execution time. While the successful RETE algorithm has numerous variations itself, there are also several alternatives, many of which more or less resemble the idea behind RETE. The most important target of improvement is the high memory consumption of the RETE network.

TREAT [ML91] aims at minimizing memory usage while retaining the incremental property of pattern matching and instant accessibility of conflict sets. Only the input facts and the conflict sets (match sets) are stored, no memories are used for partial patterns. Up to now it seems that these algorithms are more or less equal in speed and memory consumption [NGR88].

Finally, the LEAPS algorithm [Bat94] is a fully incremental approach with minimal cache; simi-larly to TREAT, no partial matches are stored, only the match sets. Its main novelty is its lazy eval-uation approach to avoid manifesting tuples unnecessarily, and by the introduction of timestamps to be able to reconstruct earlier conditions (“time travel”) for the lazy evaluation. It is believed that currently the LEAPS algorithm provides the best performance among rule based systems [Bat94].

Relational Databases Additionally, in the context of relational databases, the cached result of a query is called a materialized view. These materialized views then can be used in SQL queries to speed up execution time as commercial database engines provide this feature along with an option of automatic and incremental maintenance. This results in a conceptually similar hybrid approach as certain parts of the query are stored in caches while other segments are evaluated at execution (querying) time (similarly to the local search based pattern matcher). However, in main stream

80 CHAPTER 7. SUMMARY AND RELATED WORK IN GRAPH PATTERN MATCHING

databases this non-standard feature of materialized views is typically restricted to a subset of SQL queries which is insufficient to express complex graph patterns (especially NACs).

Some novel results