• Nem Talált Eredményt

2.4 Analysis of the Algorithm

2.4.2 Generality and Configuration Cost

In this Section, we analyze the configuration cost of the algorithm, which is caused by the parser being required to provide the parser operations. The analysis is done in an informal and rather philosophical way as it is very difficult to formalize these concepts. Essentially, this Section highlights the main difficulties that can occur when adapting a modeling language to use with the algorithm.

Proposition 2.4.1. The cost of providing the parser operations in order to apply the algorithm to an arbitrary modeling language is minimal.

Remark. This cost refers to the theoretical and practical impact of providing the required operations for an arbitrary parser. Theoretical impact refers to the effort required to provide the operations in theory, while practical impact refers to more practical concerns. In the following, we examine the theoretical and practical impact of providing the parser operations in order to reason that the cost required to adapt an arbitrary modeling language to the algorithm is fairly minimal.

AST Building

The algorithm uses parser operation 1 to build the AST-s from the raw text. This operation is depicted as the Parse process in Figures 2.1 and 2.13.

Theoretical impact. Providing this operation has no theoretical impact since the primary function of a parser is to parse the AST from the text [ASU86]. There-fore, every parser should automatically provide this operation.

Practical impact. In practice, it is recommended to wrap the built AST in a specific format, since this makes the algorithm independent of different formats.

This wrapping has to be done by the parser, thus, it has some practical impact but it should be minimal.

CHECK Operation

TheCHECKoperation (parser operation 2) is responsible for performing a correct-ness check on an AST. This ensures that the model (AST) is well-formed. This is an important requirement as we can only ensure that the algorithm works correctly if the input models are well-formed. This operation is depicted as the Correctness check process in Figure 2.13.

Theoretical impact. Since the goal of the textual language is to describe the model in a textual form, it is reasonable to assume that the parser supports checking the correctness of the AST. One way to achieve this is to build the model from the AST, and then check the model itself. Therefore, most parsers should provide this operation by default.

Practical impact. If the parser supports this operation, then there should be no practical impact as the correctness check is already built into the parsing process.

IS MATCH Operation

TheIS M AT CH operation (parser operation 3) decides whether two subtrees form a matching pair. This operation is the backbone of the algorithm, thus, it is crucial that it works correctly and that its complexity is as low as possible.

Theoretical impact. The parser is likely to not support this operation by default as it is not required during the parsing process.

Practical impact. The practical impact of providing this operation is heavily language-dependent. In the case of textual languages in which unique identifiers are present (in other words, Static Identity-Based Matching can also be used) the practical impact of this requirement is minimal. However, when there are no unique identifiers, it can be very difficult to provide this operation as complex algorithms might be needed, which also affect the performance of the operation. Since a lot of modeling environments use unique identifiers for their model elements, we estimate that in general, providing this operation usually has a somewhat lower practical impact. In other cases, however, the cost can be higher.

In the following, we demonstrate that the impact is minimal with languages in which model elements have a (context-sensitive) unique identifier. Figure 2.22 illustrates an example based on VMDL, the textual language used to describe the textual representations of models in the Visual Modeling and Transformation System [AAL+09]. The bracketed parts represent subtrees the text inside belongs to. Some subtrees are omitted for reasons of clarity. In VMTS, the nodes of the model each

Figure 2.22: An MDM example based on VMTS models.

have a unique identifier (GUID) that we can use during the matching process. Thus, the textual representation defined by VMDL also contains these identifiers, which we can use in the IS M AT CH operation. The subtrees of the nodes (e.g., attributes) also have a unique identifier: their name. However, the name is only unique in the context of the node, for example, two nodes can have attributes with the same name.

Therefore, we have to identify the context of the subtree as well. This can be done by navigating the subtrees to find the parent trees and determine the context.

Figure 2.23 depicts the XMI description of two EMF meta-models. We omitted some details to improve clarity. The subtrees are represented by the XMI tags. The unique identifiers are the names of the model elements. The subtrees of the nodes also have a unique name. However, this name is only unique in the context of the node, similarly to VMTS. Therefore, context is important and have to be taken into account by the IS M AT CH operation.

Semantic Difference Check

As mentioned before, parser operation 4 does not explicitly appear in Algorithm 3, but it is still required by the algorithm. This operation is responsible for determin-ing whether the difference between two subtrees is semantic in nature. Determindetermin-ing whether a difference is semantic or not is not as critical as the IS M AT CH opera-tion, but it helps in assigning correct automatic solutions to DTC-s.

Theoretical impact. This operation is not required for the parsing process, thus, the parser is unlikely to support it by default.

Practical impact. Again, the practical impact depends on the textual lan-guage. In the case of typical non-semantic differences like comments and formatting, regular expressions can be used for the recognition. Therefore, if non-semantic

dif-Figure 2.23: EMF-based example of context-sensitive unique identifiers.

ferences consist only of such simple cases, the practical impact should be minimal.

Otherwise, it can be considerably higher.

In the VMDL example depicted in Figure 2.22, we have marked all the differences between the two example models. The semantic differences are: 1) the cardinality of attributes Title and AuthorName are changed, 2) the name of BookAuthorRe-lationShipInstance is changed to BookAuthorEdge. The non-semantic differences are: 1) the formatting of the text (between the attributes in the edge), and 2) a comment contained in the left text. These can be discovered by using simple regular expressions. The non-semantic differences in the XMI example in Figure 2.23 can also be discovered using regular expressions.

Summary

To summarize the analysis, for the parser to provide the following operations are likely to have a considerable theoretical impact:

• IS MATCH Operation. The most critical requirement. This operation must be correct and have as low complexity as possible.

• Semantic Difference Check. It is less critical to the algorithm, but its related cost is language-dependent.

The following requirements are likely to have a considerable practical impact:

• AST Building. The AST wrapping can have a minimal practical impact.

• IS MATCH Operation. In the case of a language where unique identifiers are available for the model elements, this has a low practical impact. Other-wise, it can be higher.

• Semantic Difference Check. Similarly to the IS MATCH operation, the

cost is low in the case of languages with unique identifiers.

From these results, we can conclude that while in certain cases, the configuration cost of providing the parser operations can be quite considerable, in most scenarios, it should be acceptable. Again, it is worth noting that generality is difficult to measure in a more formal way, thus, we realize that these conclusions are the results of a philosophical debate. Nevertheless, we believe the topic is worth discussing since according to the results of the SLR we have conducted, generality usually comes with a cost. Moreover, a text-based MDM algorithm focusing on generality is rare.