Informal overview - professorBudapest,April2008 Prof.Dr.rer.nat.AndySchürr assistantprofessor D

graph transformation rule is also implemented using elementary data manipulation statements (such as insert, delete).

Furthermore, I implemented a prototype graph transformation engine, which uses open, off-the-shelf relational databases (such as PostgreSQL [97] or MySQL [125]) as a backend to demonstrate the practical feasibility of the proposed approach. For a detailed experimental evaluation, I assess how the performance of the prototype is influenced by parallel rule applications, RDBMS-specific query optimization techniques, and the choice of the underlying RDBMS.

Finally, I propose a database independent and portable GT engine implementation that uses the declarative queries of the EJB QL standard in the pattern matching phase instead of SQL commands, which aims at resolving the diversity of SQL dialects using standard J2EE technology.

Structure

The basic structure of the current chapter is the following.

• Section 6.2 informally summarizes the essence of our approach on an example prior to going into deep mathematical details. This section assumes a basic knowledge of relational database concepts.

• Section 6.3 surveys the main concepts of relational databases together with their formal defini-tions.

• Section 6.4 presents the formalization of our approach to encode graph transformation rules into relational databases. Formal proofs of correctness are listed in Appendix A.

• Section 6.5 investigates how the performance of graph transformation over a RDBMS depends on the selection of the underlying database, on the application of the built-in query optimizer, and on the usage of the parallel rule execution tool feature.

• Section 6.6 presents a portable, database independent GT engine implementation that uses queries expressed on the Enterprise Java Beans Query Language (EJB QL) for specifying graph pattern matching.

• Section 6.7 concludes the current chapter with emphasizing its relevance.

6.2 Graph transformation in relational databases: An informal overview

An informal overview is provided on how graph transformation rules can be implemented by using traditional relational database techniques. Concepts are presented on the running example of object-relational mapping, which has already been introduced by Example 1 in Sec. 2.2, and by Example 5 in Sec. 3.1. In the approach being presented in this chapter, attribute handling is not discussed in order to preserve the notational consistency of the whole dissertation. However, the implementation is able to handle attributes as shown by [151].

Mapping metamodels to database tables. In the first step, a standard mapping (for more details see [49, 110]) is used to generate the schema of the database from the metamodel.

• Each class withkoutgoing many-to-one associations is mapped to a table withk+ 1columns.

Columnidwill store the identifiers of objects of the specific class. All other columns will contain the identifiers of target objects of such outgoing many-to-one links that have the corresponding association as their direct type. If no such outgoing link exists in the model, the undefined (NULL) value is used in the corresponding column. Additional foreign key constraints, whose

78 CHAPTER 6. GRAPH TRANSFORMATION IN RELATIONAL DATABASES

role is to guarantee the consistency of the database have to be defined for columns representing many-to-one associations referring to the table assigned to the corresponding target class.

• A table with 2 columns storing the identifiers of source and target objects is assigned to each many-to-many association. Additionally, foreign key constraints are defined for both columns referring to the tables assigned to the corresponding source and target class, respectively.

• Inheritance is handled by a foreign key constraint defined for the identifier columnidof the table assigned to the subclass. This foreign key constraint maintains reference to the identifier column idof the superclass table.

Database representation of instance models. Instance models representing the system under design are stored in these database tables.

• A unique identifier is assigned to each object of the instance model.

• The identifier of each object has to appear in the column id of all tables that correspond to ancestors of the object’s direct type.

• The database representation of a many-to-one link is a row in the table that corresponds to the source class of the link’s type. This row should contain the identifiers of source and target objects in the identifier columnidand the column representing the many-to-one association, respectively.

• Each many-to-many link is represented in the database by a pair of source and target object identifiers appearing in the table that corresponds to the direct type of the link.

Example 16 The model of Fig 5.8(d) (also shown in Fig. 6.1(b)) has been selected as an instance model for the current running example, for which the corresponding database representation is depicted in Fig. 6.1(c).

• A sample database representation of an object.The model of Fig. 6.1(b) contains a UML class

c1, which is identified by the keyc1in the database. AsModelElement,NamespaceandClassare ancestors ofClassaccording to the metamodel of Fig. 2.1, all their corresponding tables should have the keyc1in their identifier columnid.

• A sample database representation of a many-to-one link. UML classc1is contained by UML packagep. This containment is a many-to-one link of typeEOgoing from UML classc1to UML packagep. The database representation of this link is a row in theModelElementtable, which has valuesc1andpin columnsidandEO, respectively.

• A sample database representation of a many-to-many link. The many-to-many link of type

UFconnecting primary keyp3and columncl3is represented by a corresponding row in tableUF. Views forLHSandNAC.The matching patterns of a graph transformation rule are calculated by using views, which contain all matchings of the rule. More specifically, we introduce a separate view for eachLHSandNACgraph.

(a) The view generated for rule graphs (LHSandNAC) executes aninner join operationon tables that represent either a node or an edge of the rule graph.

6.2. INFORMAL OVERVIEW 79

(a) Concrete syntax of the instance model of Fig. 6.1(b)

p:Package

(b) The instance model of Fig. 5.8(d)

Class

Figure 6.1: A sample instance model and its corresponding database representation

(b) The joined table isfiltered by injectivity and edge constraints. Injectivity constraints express the injective mapping of rule graph nodes and edges on the database level. Edge constraints define restrictions imposed by the graph structure, which means that the source (target) node identifier of the given edge should be found in tables representing the type of the edge and the type of the source (target) node.

(c) Finally, aprojectionselects only those columns of the filtered joined table that represent node identifiers. Information about the source and target nodes of edges is discarded during projection.

This information is unnecessary in the sequel, since requirements imposed by the graph structure have already been checked and fulfilled.

Example 17 The essence of this approach is introduced by an example listing the view generated for theLHSandNACpattern ofClassRule(see Fig. 3.1(b)).

CREATE VIEW ClassRule_lhs AS -- an LHS view SELECT c.id AS c, p.id AS p, s.id AS s -- with 3 columns FROM Class AS c, ModelElement AS c_anc,

Package AS p, ModelElement AS p_anc, Schema AS s

WHERE c.id = c_anc.id AND c_anc.EO = p.id -- EO edge eo1 AND p.id = p_anc.id AND p_anc.Ref = s.id -- Ref edge r1

AND p.id <> s.id -- injectivity constraint -- for nodes p and s

CREATE VIEW ClassRule_nac AS SELECT c.id AS c, tn.id AS tn

FROM Class AS c, ModelElement AS c_anc, Table AS tn

80 CHAPTER 6. GRAPH TRANSFORMATION IN RELATIONAL DATABASES

WHERE c.id = c_anc.id AND c_anc.Ref = tn.id -- Ref edge rn

AND c.id <> tn.id -- injectivity constraint -- for nodes c and tn

TheLHS ofClassRulerequires the presence of anEOedge that connects a UML class to a UML package. SinceEOedges are stored in theModelElementtable, it must also be included in the inner join operation in addition to tablesClassandPackage. Since the source node ofeo1has to be a UML class, only such source object identifiers of the columnidof tableModelElementcan participate in a matching that can also be found in table Class as expressed by the edge constraint c.id = c_anc.id. A similar edge constraintc_anc.EO = p.idrequires possible target object identifiers of columnEO

in tableModelElementto be equal to a value from the identifier column of tablePackage. A similar pair of equalities express the edge constraints for the reference edger1. Due to inheritance relations defined in the metamodel, every schema is a UML package at the same time. Thus, pattern nodesS andP

are not allowed to be mapped to the same object. On the database level, this (injectivity) constraint is expressed by the inequalityp.id <> s.id.

c p s c tn c p s tn c p s

c1 p s c1 t1 c1 p s t1 c2 p s

c2 p s a12 t3 c2 p s NULL

a12 p s a12 p s t3

ClassRule ClassRule_lhs ClassRule_nac ClassRule_left_join

Figure 6.2: Database representation of matchings

The left part of Fig. 6.2 shows the contents of views that have been defined for theLHSand the NACparts ofClassRule.

For instance, c1 is a UML class in the UML packagep and this UML package is connected to schemasby a reference edge in Fig. 6.1(b), thus, a matching for theLHSofClassRuleis found, which is represented by a corresponding row in the leftmost view of Fig. 6.2. Note that theLHSofClassRule

has 2 further matchings as shown by the 2 additional rows in the same view.

Since UML classc1is connected to table t1by a reference edge in the model of Fig. 6.1(b), the view generated for theNACcontains a corresponding row for this matching. This view has one further row for representing the other matching.

Left joins for preconditions of rules. When the view for the precondition graph is calculated, views of all its positive and negative application conditions are available. If the precondition has no negative application conditions then the view defined for theLHScontains the database representation of all matchings of the precondition graph.

(a) Each NAC view is left outer joined to the LHS view one by one. The join condition of this operation expresses that columns representing the same shared node in the LHS and the NAC graphs should be equal.

(b) For a matching of the precondition graph, we require (in the null condition) that columns of NAC(s), which are shared with theLHSpart, are filled with undefined values. This means that there are no possible extensions of a matching of theLHSthat is also a matching of (any)NAC graph.

6.2. INFORMAL OVERVIEW 81

(c) Then aprojectionis performed, which displays only those columns that originate fromLHS. Example 18 To continue our running example, we present the view definition for the precondition of

ClassRule.

CREATE VIEW ClassRule AS SELECT lhs.*

FROM ClassRule_lhs AS lhs

LEFT JOIN ClassRule_nac AS nac ON lhs.c = nac.c WHERE nac.c IS NULL

The left part of Fig. 6.2 shows the contents of views that have been defined for theLHSand theNAC ofClassRule, respectively. The third table of Fig. 6.2 presents the result of the left outer join operation, while the last table corresponds to the precondition ofClassRule. Note that columns representing UML class Care shared between LHS, NAC graphs, so these columns appear both in the join and in the filtering condition.

After executing the left outer join operation, the result has 3 rows. Since rows with valuesc1, and

a12in columnccan be found in bothLHSandNACviews, the corresponding 2 rows in the left outer joined table are completely filled. On the other hand, the second row of theLHSview (i.e., with value

c2in columnc) has no corresponding row in theNACview. As a consequence, the left outer joined table hasNULLvalue in columntnof its second row. As this is the only row that is not filtered out by the null condition, it can also be found in the view generated for the whole precondition graph, which means that a single matching has been found forClassRule, and as a consequence, the rule is applicable on that matching.

Model manipulation in relational databases. Operations in the graph manipulation phase can be implemented by issuing several data manipulation commands (INSERT,DELETE, andUPDATE) in a single transaction block. The transaction block is needed to ensure that a graph transformation step is atomic, i.e., either all commands or none of them are executed to result in a consistent model after rule application.

In the graph manipulation phase, deletions are followed by insertions.

• We further restrict the order of delete operations in such a way that edge deletions precede node deletions.

– If a many-to-one link has to be deleted from the model, then the table that represents the source class of the direct type association of the given link has to be updated. Specifically, the value of the column corresponding to the many-to-one association has to be set toNULL

in the row that contains the source node identifier of the link in its columnid.

– In case of a deletion of a many-to-many link, the row consisting of the source and the target node identifiers of the link has to be removed from the table that corresponds to the direct type of the given link.

• As the node identifier to be deleted can be found in tables representing the ancestors of the object’s direct type, the deletion should proceed in a bottom-up order (to respect foreign key constraints) by starting at the class, which is the direct type of the object.

During this iteration, additional attention is needed to consistently handle the removal of dangling edges from the database. As a first step, all associations have to be determined, whose source or target is the class, which is just being traversed by the iteration. Then we should perform the above mentioned edge deletion procedure on all links that have the object to be deleted as their

82 CHAPTER 6. GRAPH TRANSFORMATION IN RELATIONAL DATABASES

source or target node and that are instances of associations collected in the previous step. The final step of the iteration is the deletion of the object itself from the table that corresponds to the class being traversed. This is performed by deleting the row of this table, which contains the identifier of the given object in its columnid.

For handling node and edge insertions on the database level in the graph manipulation phase, we can use exactly the same procedures as for the initial table filling phase.

We state that the new content of database tables always corresponds to the derived model, thus it can be proven that our approach performs graph transformation over an underlying relational database.

Example 19 We continue our sample graph transformation ruleClassRulewith the model manipulation parts. This rule prescribes the insertion of a new table that contains a single column with a primary key. In addition, one many-to-many link and four many-to-one links have to be added to the model as specified by Fig. 3.1(b).

On the database level, the same effect can be achieved by generating new identifiers t2, p2, and

cl2for the new table, primary key, and column, respectively. For instance, identifier t2 is inserted into all tables that represent the ancestors of Table. Identifiers of other new objects such as p2 and

cl2are handled similarly. In order to respect foreign key constraints, insertions are executed in a top-down order starting at the table corresponding to the most general ancestor. Insertion of the 4 new many-to-one links appears as the 4 update operations presented in the listing below. Finally, the new many-to-many link of typeUFis added to the database by executing the corresponding insert operation.

-- Creating table t2

INSERT INTO ModelElement (id) VALUES (t2);

INSERT INTO Namespace (id) VALUES (t2);

INSERT INTO Class (id) VALUES (t2);

INSERT INTO Table (id) VALUES (t2);

-- Creating primary key p2

INSERT INTO ModelElement (id) VALUES (p2);

INSERT INTO UniqueKey (id) VALUES (p2);

INSERT INTO PrimaryKey (id) VALUES (p2);

-- Creating column cl2

INSERT INTO ModelElement (id) VALUES (cl2);

INSERT INTO Feature (id) VALUES (cl2);

INSERT INTO Attribute (id) VALUES (cl2);

INSERT INTO Column (id) VALUES (cl2);

-- Creating 5 links

UPDATE ModelElement SET Ref = t2 WHERE id = c2;

UPDATE ModelElement SET EO = s WHERE id = t2;

UPDATE ModelElement SET EO = t2 WHERE id = p2;

UPDATE Feature SET CF = t2 WHERE id = cl2;

INSERT INTO UF (src,trg) VALUES (p2,cl2);

When the execution of these graph manipulation commands terminates, the new content of database tables corresponds to the derived model of Fig. 5.8(e).

In document professorBudapest,April2008 Prof.Dr.rer.nat.AndySchürr assistantprofessor Dr.DánielVarró,PhD associateprofessor Dr.KatalinFriedl,PhD MScinTechnicalInformaticsSupervisors: GergelyVarró PhDThesis AdvancedTechniquesfortheImplementationofModelTransformationSy (Pldal 89-94)