Examining the Performance of the Extension

6.3 Simulation results

6.3.1 Examining the Performance of the Extension

Having measurements with different typical parameters, we discovered in which cases can the application of the protocol result in improved performance.

We expect the highest performance increase, when the clients have only one, very concentrated field of interest, and they can store a very limited set of files only. This is the case with the smartphones with built-in or small capacity memory card, that barely can store a full music album. We restrict the searches and stored documents of one node to only a music genre or to a single topic, like in Bibster.

We allowed the nodes to store up to 5 files according to their field of interest. The distribution of the topics and the documents was set to Zipf.

The simulation parameters reflect the data collected by our crawling Gnutella client. There were 16.000 nodes in the simulated network with 4-7 connections each. We differentiated 10 major music genres. We obtained surprisingly good results, when a node sent out queries for music in its field of interest: After receiving

Chapter 6. The SemPeer Protocol Extension 84 replies for an average 5-15 queries, the protocol could construct such a layer over the standard random network that achieved a sevenfold hit rate to the basic protocol.

The query hit probability increased from 0.11 to as high as 0.69 (Figure 6.2).

Figure 6.2. Increase in hit rate in a very specialized network.

Of course, if the topics of the queries or the stored documents are not so homo-geneous, then the increase in the average hit rate is smaller, however, the advance in the performance can still be observed. The next series of simulations show the average performance of the protocol in a Gnutella network with parameters established by most of the users of our mobile client. There are an average of 5 con-nections per node, and the number of stored files per device is around 30-50. We take only the results in maximum 4 hops distance (TTL=4). The used taxonomy is the WordNet, each document is provided with 4-5 keywords. We established two different set of simulations. In the first case, the distribution of the fields of inter-ests of the users, and also that of the documents and the queries follows strictly the Zipf distribution. This means that there are some topics the documents of which are stored only by a very few nodes, decreasing the average hit rate of the whole

Chapter 6. The SemPeer Protocol Extension 85 overall network. However, our crawler client showed that there is no exponential decrease in the case of the less popular topics, therefore, the performance of the protocol in the general Gnutella network is better than the simulations with Zipf distribution. In the second set of simulations, we used uniform distribution for the popularity of topics and documents. These results predict what hit rate an aver-age user without very specialized interests can expect. In the standard Gnutella network the average hit rate is 50.4% in case of Zipf distribution, and 54.5% in case of uniform distribution. The results are shown in Figure 6.3. For the sake of clarity, we labeled the measurements in a form of "SxOffy", where x is the number of standard (random) Gnutella connections maintained by a node and y is the percent of off-topic queries, that is, queries which are aimed at files that do not match any of the fields of interest of the issuing node.

In Figure 6.3, we compare the performance of the semantic layer (without any random links) in the case of Zipf and uniform distributions. When there are no off-topic queries sent in the network, the overlay can deliver almost 100% hit rate in case of Uniform distribution. With the increase of the ratio of off-topic queries, the performance of the overlay decreases. Regarding the average performance of the network, the Zipf distribution enables lower performance gain with the semantic layer. The hit rate in the topics with different popularity varies from70%to almost 100%.

To illustrate the role of the standard (off-topic) connections, we show an in-teresting result. It is known that the role of "weak ties", that is, the random connections in social networks is to locate resources (documents, people) that be-long to groups (topics) other than a node is interested in. If we eliminate almost all of the random links in the SemPeer network, the chances to locate off-topic files in the overlay network will be smaller than in a standard Gnutella network. In Figure 6.4, we show a set of nodes that adapted their connections to a specific topic in the previous simulation environment, with 0% of off-topic queries. The ratio of semantic connections was 90%. After ten thousand simulation steps we changed their behavior to start to initiate queries for random documents. In Figure 6.4, we compared their performance to the standard Gnutella nodes with random links and found that the absence of random links resulted in worse performance. These results can also be validated with the analytical model.

Chapter 6. The SemPeer Protocol Extension 86

Figure 6.3. The performance of the semantic layer in case off a) uniform and b) Zipf distribution of the topics and documents

As we know from the analytical model, the number of random and semantic links should be defined according to the ratio of issued interest-related and off-topic queries. The SemPeer protocol extension learns that information from the Query Profile in order to maximize the overall answer rate. In the following series of simulations, we predefined the number of semantic links and varied the ratio of the off-topic queries. In case of uniform distribution, as expected, as the ratio of off-topic queries raises, the simulation sets with 1 or 2 random links indicates better performance that the semantic overlay in itself (Figure 6.5 a ). As the ratio of off-topic queries increases to 20% and 40%, more and more random links are required to ensure the highest hit rate (Figure 6.5 b and c ). In Section 6.3.3, we show how the analytical model can be used to find the proper ratio of links.

Chapter 6. The SemPeer Protocol Extension 87

Figure 6.4. Simulation results showing the decrease in performance when no random links. Nodes start issuing off-topic questions at step 10.000

Chapter 6. The SemPeer Protocol Extension 88

Figure 6.5. Answer rate with different ratio of off-topic queries. a) P_{of f} = 0,1, b) P_{of f} = 0,2, c)P_{of f} = 0,4

Chapter 6. The SemPeer Protocol Extension 89

In document Budapest University of Technology and Economics Department of Automation and Applied Informatics SEMANTIC INFORMATION RETRIEVAL IN MOBILE PEER-TO-PEER NETWORKS SZEMANTIKUS INFORMÁCIÓ-VISSZAKERESÉS MOBIL PEER-TO-PEER HÁLÓZATOKBAN (Pldal 94-100)