• Nem Talált Eredményt

4.3 Structure and Analysis of Similarities

4.3.4 Queuing Model

Table 4.2. Similar terms table

similar term 1 similar term 2 weight

Joe Joseph 9

Sam Samantha 3

Katharine Kate 6

Proposition 4.14. The 4.3.3 similar term handling method increases the number of detected relevant similarities while increases the person comparison time only with constant.

Proof. Let M = {M1, M2, ..., M k}. Without any restrictions assume that M1 relates to the name match event. The number of detected similarities, because of M1 condition is SM1. The similar term handling extension means formally that we change M1 to M1, where M1 means that the algorithm checks whether the two profile structure contains the same name or the names are in the similar terms table. The number of detected similarities, because of M1 condition is SM1. Since P[M1]≤P[M1], SM1 ≤SM1.

The execution time of the algorithm increases only with a table select step when the similar terms table is checked. Only those rows are relevant where the weight value is greater than a specific value. If we apply similar term handling for other profile details, the comparison time increases linearly.

Similarity handling is a key issue in mobile related social networks, since we pro-posed a semi-automatic similarity handling solution, users have to decide whether to accept or ignore detected similarities. In order to operate this solution efficiently the algorithm should be as precise as possible to make the decision for users easier.

This way the objective of the similarity detecting algorithm is to detect all relevant similarities with as high precision rate as possible.

This behavior is similar to a queuing system where the processing unit is the al-gorithm and the entities in the queue are the person pairs which are waiting for comparison.

In the following model we consider only the registration operation, as it gives a proper upper bound for other operations. This operation can be divided in two main tasks. Firstly, when a member registers, she or he should be compared to every private contact in the network which means |UP c| comparisons. If we consider the number of private contacts in a phonebook as a random variableXP c, this means E[XP c]∗NM comparisons, where E[XP c] is the expected value of the phonebook sizes. Following we refer toE[XP c]asC.NM is the number of members in the network before the registration.

The second objective during the member registration is to check, which mem-bers of the network are in the phonebook of the new member. This task requires NM∗XP c comparisons, where the size of the new phonebook is modeled also with exponential distribution.

This way the amount of comparisons required by a member registration can be modeled with the XP c random probability variable:

XP c =C∗NM +XP c∗NM =NM ∗(C+XP c) (4.5) In order to design an efficient mobile related social networks which is usually close to thefresh state we need a model which shows how fast the algorithm should work to detect appearing similarities. Following we model the registration rate of members as aPoisson process withλparameter and we assume that a person pair comparison is the time unit.

Proposition 4.15. Considering λ Poisson arrival for member registration and stable queue, the queue length of the similarity detecting algorithm (N¯), where a person pair compairison step is considered as time unit, can be calculated as follows:

N¯ = λ

1

2C∗NM −λ (4.6)

For the proof we use two lemmas.

Lemma 4.16. Let (XP c)be a random variable which describes the phonebook sizes of the members. Based on measurements, (XP c)can be estimated with exponential distribution.

Proof. During the operational period of Phonebookmark we collected anonym statistics. Figure 4.6 shows an aggregated distribution for the phonebook sizes.

X-axis represents phonebook sizes, while Y-axis represents the number of mem-bers who have at least this specific amount of private contacts in their phonebooks.

Figure 4.6. Size of phonebooks in Phonebookmark

Figure 4.7 shows the tail distribution of the phonebook-sizes such that the X-axis has linear scale and the Y-axis logarithmic scale. The points on this figure fit very well to a line, which means clearly that the tail of the phonebook sizes decreases exponentially. This method provides a simple empirical test for whether a random variable has an exponential distribution. In this case the gradient of the function gives the parameter of the exponential distribution. In this measurement this parameter is 0.0047.

The expected value of the exponential distribution can be calculated as the reciprocal of its parameter, thus the expected value of phonebook sizes according to the previous measurement is212, which is a reasonable number. Lemma 4.16 shows that phonebook sizes can be modeled very well with an exponential distribution.

Figure 4.7. Size of phonebooks with logarithmic scale

Lemma 4.17. The XP c random probability variable follows exponential distribu-tion.

Proof. Because of the linear transformation, the distribution function ofXP c looks as follows:

FX

P c(x) =FXP c(x −NM ∗C

NM ), (4.7)

if NM >0, which is always true in case of mobile related social networks. This way since the distribution of XP c and XP c is the same, XP c has also exponential distribution.

Proof. (Proposition 4.15)

We say that the similarity handling queue is stable when the process rate of similarities is higher than or equal to the arrival rate. According to Kleinrock’s model for queuing systems [Kleinrock, 1975], when the arrival rate is modeled with a λ parameter Poisson distribution and the processing with a µ parameter exponential distribution, then the requirement for stability is:

λ

µ <1 (4.8)

This means that the expected value of serving time (1µ) is smaller than the expected value of time between arrivals (1λ). In our case the expected value of the serving time is E[XP c ], since we considered a person pair comparison as the time unit. By applying Lemma 4.17 we can see thatXP c has an exponential distribution and the expected value of it is calculated by:

E[XP c ] = E[CNM +XP cNM] (4.9)

= CNM +NME[XP c] (4.10)

= 2CNM (4.11)

In case of exponential distributions, the reciprocal of the expected value is the µparameter. This way the requirement of the stability looks as follows:

λ

1 2CNM

<1, λ < 2CN1

M

(4.12) With the help of these results, according to [Kleinrock, 1975], we are able to calculate the average wait queue length (N¯) for the similarity detecting queue:

N¯ = λ

1

2C∗NM −λ (4.13)

Based on this model, the resource requirement of the similarity detecting can be calculated in real environment, considering the speed of the processing unit(s).

In order to demonstrate the behavior of this queue, we have made measurements regarding to the registration of the members in Phonebookmark. Figure 4.8 illus-trates the queue length considering 2C∗NM and 2.5C∗NM processing rates.

The X-axis shows as the number of members in the system increases, while the Y-axis represents the number of comparison steps when a new member reg-isters (sum of the remaining comparison and the new ones). It can be seen that the average queue length can be decreased significantly, when the processing rate increases.

Figure 4.9 illustrates the queue length normalized with the number of members.

Figure 4.8. Queue length for similarity calculation with 2C ∗ NM and 2.5C ∗ NM

processing rates

Figure 4.9. Normalized queue length for similarity calculation with 2C ∗ NM and 2.5C∗NM processing rates