• Nem Talált Eredményt

In this section, two representative scenarios (see Table 3.3) are exhaustively analyzed. In par-ticular, the efficiency of different attacker functions presented in Section3.4.3and the efficiency of the defense mechanism presented in Section 3.5 are investigated. Beyond the analysis of two emphasized scenarios, I show the differences compared to the other simulated scenarios. In the two considered scenarios, I investigate the effect of the Hide-and-Lie Strategy on the reached gain and the number of downloaded primary and secondary messages and the maximum memory required to follow the proposed Hide-and-Lie strategy.

As my experience showed that the chosen mobility model does not affect the results consid-erably, I have selected the random walk mobility model in the analyzed scenarios. Because of the space limits, I emphasize rather the effect of the probability of being interested in a category instead of the number of categories. Therefore, in the presented simulation results, the number of categories is fixed to 30, which can be a realistic value for a lot of applications. In the two investigated scenarios, the probability of being interested in a category takes the values 0.05 and 0.4. The former value refers to those scenarios where the nodes are interested in a small subset of messages, while in the latter scenario, the nodes are interested in a large subset of messages. In Table3.3, I summarize the parameter values of the scenarios beyond the already fixed parameters introduced in Table3.1.

Table 3.3: Parameter values of investigated scenarios Mobility model C ε

Scenario 1

RW 30 0.05

Scenario 2 0.4

The success probability of the attacker functions is plotted against different Hide-and-Lie strat-egy values (λ) and different attacker delay (τ) values of Scenario 1 and 2 in Figure 3.2(a) and 3.2(b), respectively. For the sake of better understanding, the plots are separated by different attacker delay values.

The Prefiltered ID Based attacker function assumes that the nodes do not apply any privacy enhancing technique. According to this, it is the most efficient attacker function when λ = 0, but in any other cases, the attacker function can not distinguish the target node from the others, because even one entry changing in theEIP misleads the attacker.

A more robust solution can be obtained by omitting the prefiltering which results in the Un-filtered ID Based attacker function. The success probability of this function decreases whenλ= 0 compared to the prefiltered function but considerably increases in other cases. The reason is that the number of all the combinations of the messages give enough variety to the attacker to identify the nodes with higher probability even if they hide a small subset of the messages when they meet other nodes. As the nodes increase the λvalue, they collect messages from larger sets and they can hide more messages. Hence, the nodes are able to deceive the attacker with high probability.

Therefore, the success probability of the attacker function decreases with the increasingλ value.

Ifλ= 0.5, the attacker function is as inefficient as a na¨ıve attacker.

The Unfiltered ID Based attacker function is very sensitive for the attacker delay. Asτincreases the nodes delete more and more messages making the attack less and less efficient. Finally, the nodes delete all the messages that could match theIDLuT(t0) after time steps and this attack becomes inefficient in cases whereτ = 500 or τ = 1000. Recall that = 500 in the considered scenarios.

The Category Histogram Based attacker function is less sensitive to theτ value, but it is less efficient whenτ is lower than the ID Based attacker function. The inefficiency of this attacker function comes from the fact that the Hide-and-Lie Strategy causes intolerable differences for the χ2–test when all the messages appear or disappear belonging to a category whenEIP changes.

The attack that is least sensitive toτ is the Significant Category Based attacker function. The advantageous characteristic comes from the fact that this function tries to reveal the real interest

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =1

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =50

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =250

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =500

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =1000

Prefiltered ID Unfiltered ID Category Histogram Significant Category

(a) Scenario 1

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =1

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =50

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =250

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =500

0 0.1 0.2 0.3 0.4 0.5 0

0.2 0.4 0.6 0.8 1

τ =1000

Prefiltered ID Unfiltered ID Category Histogram Significant Category

(b) Scenario 2

Figure 3.2: Success probability of A as a function of the Hide-and-Lie strategy values (λ)

profile. However, it still does not work when the nodes hide their identity withλ= 0.5 strategy, because there are no over- and underrepresented categories in that case.

The Significant Category Based attacker function is the most efficient attacker function in Scenario 2, but it is less efficient in Scenario 1.

Taking all the considered attacker functions into consideration, I can conclude that the efficiency of the attacker functions changes according to the parameters of the model. However, a common tendency is that if the nodes apply the Hide-and-Lie Strategy with high value ofλ, none of the attackers is able to distinguish them better, independently of the value ofτ, than a na¨ıve attacker which picks up one of the nodes by random.

Even if an attacker can distinguish two nodes if theirIPs are different (I call this attacker ideal IP based attackerAIPideal), the probability that two nodes have the sameIP is not negligible.

The success probability of an idealIP based attacker can be viewed as an upper bound for any otherIP based attacker, such as, e.g. the Significant Category Based attacker function. This value can be determined analytically. Through this analysis, I show how differentCand εvalues affect the success probability of the attackers.

The success probability of the idealIP based attacker is determined by the number of equal IPs. To compute the success probability, first the probabilitypof two IPs being equal is computed as follows:

p=

C w=1

(C

w

) (ε2)w( (1−ε)2

)Cw

(1(1−ε)C)2

= (

ε2+ (1−ε)2 )C

(1−ε)2C (1(1−ε)C)2

(3.5)

wherewis the weight of theIP varying between 1 andC (recall that every node is interested at least in one category).

The success probability ofAIP idealis the reciprocal of the average number of nodes with the same IP:

Pr(AIPideal(U PuT(t0), U Pu1(t1), . . . , U PuN(t1)) =uT)

1

1 +p(N1) (3.6)

The ideal values according to Eq. (3.6) are 0.341 and 1 for Scenario 1 and 2, respectively. These values are valid only forλ= 0, and confirmed by Figure3.2. These values are shown in Figure3.3, too, where Eq. (3.6) is plotted against differentC andεvalues.

The characteristic of the success probability of the attacker in the case of the two emphasized scenarios are similar to each other as Figures3.2(a) and3.2(b)show and these are similar to the other scenarios which are simulated but not presented here. However, as Figure 3.3 shows, the success probability of the idealIP based attacker depends on the parameter value of the number of categories and the probability of a node being interested in a category. As one can read from the figure, when there are large number of categories in the system, the success probability of an ideal attacker is high. On the other hand, when the number of the categories is low, the success probability highly depends on the value ofε. As the valueεgets closer to 0.5, the success probability increases. The reason is that an attacker can distinguish nodes when the probability that theIPs of two nodes are equal is low. All these statements are confirmed by the simulation results that are not presented here, and these effects can be observed even in cases whenλ >0.

In Figure3.4, I show the average gain of all the nodes as a function of the Hide-and-Lie strategy in the two scenarios and its empirical standard deviation. I have to stress that these two figures do not represent all the appeared characteristic of the figures, however, Figure 3.4(a) shows an interesting property of the Hide-and-Lie Strategy. Namely, increasing λ does not degrade but increases the data delivery ratio in some scenarios.

The Hide-and-Lie Strategy has two contradictory effects: On the one hand, when the nodes happen to hide what they are interested in, they may miss some primary messages to download.

0 0.2 0.4 0.6 0.8 1 0

10 20 30 40 500 0.2 0.4 0.6 0.8 1

ε C

Success probability

Figure 3.3: Analytically determined upper bound for success probability of ideal IP based attacker functions when 300 nodes are present in the network

0 0.1 0.2 0.3 0.4 0.5

0 0.3 0.4 0.48 0.55 1

Hide−and−lie strategy value

Goodput

(a) Scenario 1

0 0.1 0.2 0.3 0.4 0.5

0 0.57 1

Hide−and−lie strategy value

Goodput

(b) Scenario 2

Figure 3.4: Average gain with the empirical standard deviation

On the other hand, when the nodes happen to lie being interested in some category, they store-carry-and-forward secondary messages, which increases the data delivery ratio in general [Butty´an et al., 2010a]. The cumulative effect depends on the system parameters. E.g. in a case when nodes are interested only in a small subset of categories and they do not carry secondary messages, they can exchange messages only with small probability. Therefore, the Hide-and-Lie Strategy in some cases can be viewed as a motivation to store-carry-and-forward secondary messages as it can be seen in Figure3.4(a). On the other hand, when the nodes have many possibilities to get primary messages, the latter effect has no considerable benefit while the former effect degrades the gain.

Surprisingly, the two effects are balanced in Scenario 2 as one can see in Figure3.4(b).

Even though I did not take into consideration the energy consumption and the memory costs of the communication when I calculated the gain, I collected related information during the simu-lation. I plotted the average number of primary and secondary messages downloaded by one node and maximum memory usage as a function of the Hide-and-Lie strategy in the two considered scenarios in Figure3.5.

As one can expect, the number of the downloaded secondary messages increases with increasing λvalue. The number of the downloaded primary messages changes as the gain changes because the gain is a normalized value of the number of obtained primary messages.

0 0.1 0.2 0.3 0.4 0.5 0

200 496 1278 1820 2167

Hide−and−lie strategy value

Quantity

Primary Secondary Memory size

(a) Scenario 1

0 0.1 0.2 0.3 0.4 0.5

0 219 858 1208 1360

Hide−and−lie strategy value

Quantity

Primary Secondary Memory size

(b) Scenario 2

Figure 3.5: Costs: Average number of primary and secondary messages downloaded by a node and the maximum memory usage

Even though the gains are comparable in the two scenarios as Figure3.4shows, there is almost one order of magnitude difference in the number of downloaded primary messages. The reason is that in Scenario 1, the nodes are interested in 5% of the messages and in Scenario 2, the nodes are interested in 40% of the messages while the number of the generated messages does not change considerably in the two scenarios. Due to the same reason, the number of the secondary messages for a node is less in Scenario 2 than in Scenario 1. The ratio of the number of downloaded primary and the number of secondary messages isε(1−λ) : (1−ε)λ.

Note that even if the nodes download more and more secondary messages asλincreases, the maximal memory usage does not increase at the same order. Thus, the nodes do not need to maintain much larger memories when they want to protect their privacy.