Eliminating clustering in the propagation graph: the Disjoint Rings

Selecting the semantic connections which might answer the queries with high prob-ability is not enough in itself to increase the answer ratio. The protocol should contain parts that ensure the construction of an appropriate topology of the net-work. In case of random connections the clustering coefficient is negligible. How-ever, when transforming the network according to the fields of interest, this ratio considerably increases [Barabási, 2002]. In case of SemPeer protocol we have three expectations of the topology:

1. The advantages offered by the semantic connections should be utilized.

2. Clustering in the query propagation path should be eliminated.

3. There should not be nodes with special roles to construct the topology. Also, the T opologyCheck algorithm should not use considerable resources.

The first condition is straightforward. The second objective specifies that a query does not arrive to a node more than once in different ways because of the high clustering. The third consideration follows from the fact that the protocol is intended to be used in a rather transient environment.

We found that clustering in the propagation graph can be decreased when defining smaller rings, loops in the topology. As the nodes with similar profiles connect to each other, more "disjoint" rings exist in the network. Therefore, we call the new topology "Disjoint Rings". In order to constrain the network to shape according to this topology, we should use the following algorithm as the topology check function in Algorithm 6.1.

Algorithm 6.2

1: procedure TOPOLOGY_CHECK (IP P eerIP)

2: if P eerIP[4] %M inLoopSize+ 1 =self.ip[4]%M inLoopSize then return true;

3: return false;

Chapter 6. The SemPeer Protocol Extension 79 The explanation of this condition check is the following. We extract the last byte from the IP address of the nodes. This value is available for each node and can be regarded as a random value between0 and 255. M inLoopSize defines the minimum number of nodes that participate in a cycle in the graph. With modulo division, we can assign a number between0 and M inLoopSize(called loop order) to each node. Based on this loop order we can guarantee the minimum loop size without any need for super-peers. This algorithm straightforwardly satisfies our expectations 1 and 3. Regarding the clustering, we prove a proposition that shows how the topology eliminates most of the different kinds of counterproductive links when some conditions are satisfied.

Proposition 6.2. If T T L < M inLoopSize < 256, the use of Algorithm 6.2 en-sures that the

i. number of backward links is zero, ii. number of sibling links is zero,

iii. and the maximum number of skew links equals k∗(k^{(T T L−1)}−1) in the query propagation graph.

Proof. The modulo division in 6.2 classifies the nodes into M inLoopSize classes, and the destination of a connection of a class is constrained to be in the consecutive class. Therefore, a series of connections started from a specific class will reach the same class after at least M inLoopSizehops. If M inLoopSize is greater than the T T Lparameter, a message will be deleted earlier than reaching a node on the same level of the query propagation graph. This results that no sibling or backward links can exist in the query propagation graph.

The maximum number of skew links can be calculated as follows. Because of the classification, a skew link can point only from two (or more) nodes from a level in the query propagation graph to the next level. Regarding the last two levels, the worst case is when all thek^{(T T L)} node in the one but last level connect to the sameknodes, which meansk∗(k^{(T T L−1)}−1)skew links. Changing a connection in an earlier level to a skew link increases this number by one, however, the number of skew links between the last two levels decreases at least by k.

Chapter 6. The SemPeer Protocol Extension 80 From this proposition it follows that the Algorithm 6.2 eliminates the bulk of the counterproductive links (backward and sibling links) with minimal computing resources. Selecting T T L+ 1 for the M inLoopSize parameter is a good choice.

The maximum value of skew links calculated iniiican be achieved only under very special circumstances. Moreover, the clustering caused by skew links is due to the loops that are smaller thanM inLoopSize, which, in turn, can be recognized easily as duplicate queries at runtime and can be eliminated by dropping the appropriate incoming connection. This can be stated as follows.

Proposition 6.3. If the nodes use Algorithm 6.2 and they recognize and drop incoming connections that send duplicate queries, then the clustering in the prop-agation path will be eliminated.

Proof. We suppose that there is a cycle in the query propagation graph, and c^k is one node in the cycle, on the highest, k^th level. Proposition 6.2 suggests that the number of backward and sibling links is zero, therefore only one node can be placed on the k^th level, and exactly two nodes on all the preceding levels (c^k−i, 1 < i < k−1), and c¹ is the query initiator. In this case c^k receives the issued query two times, through the two nodes on level k−1. Therefore, c^k recognizes this as a duplicated query and according to the assumption in the Proposition, it drops one of the two incoming connections, eliminating the cycle. From Definition 4.3.c. follows that in that case C_mod= 0.

It is important to see that recognizing and refusing incoming connections that deliver duplicates of messages in itself, without the topology checking is a bad solution as it will continuously cut off large parts of almost independent query propagation paths, preventing the transformation of the network. With the elimi-nation of the backward and sibling links, the probability of a duplicate query (that is, a skew link) is dramatically decreased, as we are going to illustrate it in the next section at the simulation results. In Figure 6.1, we sketched a part of a network with Disjoint Rings topology.

From the proposition above it directly follows the proof of the next corollary:

Corollary 6.4. The Clustering Coefficient of Watts and Strogatz equals zero in a network with Disjoint Rings topology.

Chapter 6. The SemPeer Protocol Extension 81

Figure 6.1. Part of a network with Disjoint Rings topology

Proof. Recall the formula of Watts and Strogatz (Definition 4.8):

C_i = |{e_jh}|

2k_i(k_i−1) :v_j, v_h ∈N_i, e_jh ∈E (6.3) Because the Disjoint Rings topology eliminates the sibling links, the value of

|{e_jh}| in this formula equals zero.

In document Budapest University of Technology and Economics Department of Automation and Applied Informatics SEMANTIC INFORMATION RETRIEVAL IN MOBILE PEER-TO-PEER NETWORKS SZEMANTIKUS INFORMÁCIÓ-VISSZAKERESÉS MOBIL PEER-TO-PEER HÁLÓZATOKBAN (Pldal 89-92)