Conflict Detection Phase - A Novel Text-Based Model Differencing and Merging AlgorithmMerging A

2.3 A Novel Text-Based Model Differencing and Merging AlgorithmMerging Algorithm

2.3.3 Conflict Detection Phase

The conflict detection phase operates on the output of the AST matching phase.

Later, in the merge phase, recognized conflicts are used, thus, they have to be tracked by the algorithm. In order for the algorithm to accurately track a conflict, subtrees that are relevant to a conflict are assigned to it. The goal of the conflict detection phase is to find every conflict and assign automatic solutions to them whenever possible. Automatic solutions will be applied in the merge phase.

Definition 2.3.1. A conflict (t₁, t₂, t_merged) consists of subtrees t₁ ∈ AST₁ and t₂ ∈ AST₂, where AST₁ is the first, AST₂ is the second AST parsed from the two input models, andt_merged is the subtree representing the conflict in the merged AST.

Definition 2.3.2. Theabsolute positionof a subtree is its character-wise position in the merged text. The absolute position consists of thestarting positionand the ending position. The subtree representing a conflict in the merged AST (t_merged) has its absolute position tracked by the algorithm, which will be used during the merge phase.

Remark. This definition of a conflict ”overrides” the general definition of a conflict in Definition 2.1.6. The reason behind this is that the algorithm assigns conflicts based on subtrees instead of model elements, and as we have mentioned before, a model element can be described using multiple subtrees. The definition of t_merged and the concept of absolute position is needed for the merge phase.

Definition 2.3.3. A solution of a conflict is a string that can be used to replace the textual representation of the conflict in the merged text, based on its position.

The active solution of a conflict is the text currently representing t_merged in the merged text. An automatic solutionis a solution that is chosen automatically by the algorithm, based on specific considerations.

In the following, the conflict types recognized by the algorithm are introduced.

We discuss their basic principles demonstrated by examples, along with automatic solutions assigned by the algorithm (where applicable).

Different Text Conflict (DTC)

A Different Text Conflict (DTC) occurs when two subtrees are matching but their textual representations contain differences. A DTC can either be of a semantic or a non-semantic nature, which is determined by parser operation 4. A DTC is recognized by performing raw text differencing and is always assigned to the innermost subtrees in the conflict. For example, if in a model, a field inside a node is changed, a DTC is assigned to the subtrees representing the field in AST1 and AST₂, instead of the subtrees representing the nodes. This gives a more accurate location of the conflict.

Figure 2.15: Different Text Conflict (DTC) - example.

In the case of semantic conflicts, it is very difficult (if not impossible) to auto-matically assign a solution, thus, no automatic solution is assigned. However, if the conflict is non-semantic, then the longer text is assigned as the automatic solution.

The reasoning behind is that non-semantic differences usually only contain meta information, thus, most often there is no downside to keeping them.

An example is illustrated in Figure 2.15. After performing raw text differencing and parser operation 4, the algorithm discovers that there is a non-semantic differ-ence (a comment) between the two versions of node A. Thus, the algorithm can assign the longer text as the automatic solution. There is another conflict regarding the type of fieldB₁. This is a semantic difference, which means that the solution of this conflict cannot be automatically assigned.

New Tree Conflict (NTC))

A New Tree Conflict (NTC) is assigned to each unmatched subtree. An NTC occurs if there is a subtree is present in one of the AST-s, but it belongs to no matching pair. Thus, an NTC is recognized for every unmatched subtree found during the AST matching phase. There are two possibilities on where to insert the new subtree in the merged text: i) at the end of the text or ii) at the end of the subtree after which the unmatched subtree was found. Most of the time, the order of subtrees does not matter. However, in the rare case that it does, solution ii) better preserves the order, thus it is assigned as the automatic solution.

An example is illustrated in Figure 2.16, where there are 3 unmatched trees:

B₂, C and C₁. For field B₂ and node C, an NTC is created. In the case of C₁, however, it is unnecessary to create a new NTC as C₁ is the child of C, thus the NTC associated with C is indirectly associated with C₁ as well.

Figure 2.16: New tree conflict (NTC) - example.

Move Conflict (MC)

A Move Conflict (MC) is assigned to a matched pair if the relative positions of the subtrees in their respective AST-s are different. The subtrees in the pair can also be on different levels. An MC can either be of a semantic (e.g., a contained node), or a non-semantic (e.g., moving an element) nature. Of course, moving an element in the text can be considered to be a semantic change in some modeling languages.

Figure 2.17 illustrates the recognition of a Move Conflict on an example. The upper part of the Figure contains 6 subtrees in the order of A−B−C−D−E−F while the lower part contains the subtrees in the order of C−B−A−D−F −E.

The algorithm starts by determining the relative position of every other subtree compared toA. In the first AST,O₁ ={B, C, D, E, F}are located after nodeA. In the second AST, O₂ ={D, F, E} are located after node A. An MC will be created for∀o ∈Ou :Ou ={(O1S

O2)\(O1T

O2)}. Note that duplicated elements are only counted once. Based on this, the algorithm recognizes two MC-s: A−B andA−C.

Applying this logic to the other elements, B −C and E −F are also recognized as Move Conflicts. Paired subtrees on different levels can be recognized as subtrees and can be labeled by their level. The algorithm arbitrarily chooses the order in AST₁ as the automatic solution.

Figure 2.17: Move Conflict (MC) recognition - example.

Conflict Detection

Algorithm 3 presents the conflict detection phase. For ∀t ∈ U nmatched, the algo-rithm recognizes a New Tree Conflict (N T C). While doing so, it also determines the source of the subtree in the merged AST (N T C SOU RCE), namely, the first previous tree that does not belong to an NTC. Note that assigning the automatic

Algorithm 3: Conflict detection - main algorithm Input: AST₁, AST₂, M atched, U nmatched Output: N T C∪DT C ∪M C

1 for ∀t ∈U nmatched do

2 AST_{SOU RCE} ←N T C SOU RCE(t);

3 N T C.ADD(t, AST_{SOU RCE});

4 for ∀(t₁, t₂)∈M atched do

5 T Dif f ←T EXT DIF F(t1, t2);

6 if T Dif f 6=∅ then

7 (I₁, I₂) ← INNER TREES(AST₁, AST₂, t₁, t₂);

8 DT C.ADD(I1, I2, T Dif f);

9 if t₁.LEV EL6=t₂.LEV EL then

10 M C.ADD(t₁, t₂);

11 else

12 ORDER DIF F S ←DIF F ORDER(t₁, t₂, M atched);

13 for ∀o ∈ORDER DIF F S do

14 M C.ADD(t₁, o);

Algorithm 4: The DIF F ORDER operation Input: t1, t2, M P

Output: O_u

1 SIBLIN GS₁ ←CHILDREN(P AREN T(t₁))\ {t₁};

2 O1 ← ∅;

3 for ∀t ∈SIBLIN GS₁ do

4 if IS M AT CHED(t, M P)∧t.IN DEX > t₁.IN DEX then

5 O1.ADD(t);

6 SIBLIN GS₂ ←CHILDREN(P AREN T(t₂))\ {t₂};

7 O₂ ← ∅;

8 for ∀t ∈SIBLIN GS₂ do

9 if IS M AT CHED(t, M P)∧t.IN DEX > t₂.IN DEX then

10 O₂.ADD(t);

11 O_u ←(O₁S

O₂)\(O₁T O₂);

solution is omitted from Algorithm 3 as it is a trivial task both here and in the case of other conflict types as well. Also note that this means that parser operation 4 does not appear explicitly. For ∀(t₁, t₂) ∈ M atched, the algorithm first performs raw text differencing (T EXT DIF F) on t1 and t2. If there are any differences, a DTC is recognized, assigning the correct inner trees by performing DFS on the subtrees (IN N ER T REES). For t₁, t₂ pairs that are on different levels, an MC is recognized, which is often a semantic difference. Otherwise, order differences are collected between subtrees, which is done in the DIF F ORDER operation.

The DIF F ORDER operation (Algorithm 4) collects pairs of nodes whose rel-ative order to each other is different in the two AST-s. The SIBLIN GS of both t₁ andt₂ in their respective AST-s are collected. Then, the algorithm iterates over both sets (SIBLIN GS₁ and SIBLIN GS₂) and collects elements that have a matching pair (IS M AT CHED; note that this is different from theIS M AT CH operation from earlier) and are positioned later in the text than t₁ or t₂. Finally, the set op-eration is performed, according to the discussion of the example in Figure 2.17. It is worth mentioning that when applying the algorithm in practice (for example, as part of a version control system), it is beneficial to filter conflicts that are redundant for easier practical use. This step however is omitted from the description of the algorithm in order to keep it more focused and concise.

In document Ph.D.DissertationFerencAttilaSomogyiAdvisor:GergelyMezei,Ph.D.AssociateProfessor INVESTIGATINGTEXT-BASEDDOMAIN-SPECIFICMODELINGTECHNIQUESSZ¨OVEGALAP´USZAKTER¨ULETIMODELLEZ´ESIM´ODSZEREKVIZSG´ALATA (Pldal 38-42)