• Nem Talált Eredményt

4.3 Structure and Analysis of Similarities

4.3.1 Structure of Mobile Related Social Networks

Considering general social networks as graphs, nodes are representing registered members and edges between them represent social relationships (e.g. friendship).

In case of mobile related social networks each member has a private mobile phone with a phonebook (Figure 4.1).

In Figure 4.1 we can see that phonebook contacts result new type of nodes in the graph representation and the edges between these private phonebook contacts and members represent which member "owns" those private contacts.

Figure 4.1. Basic structure of mobile related social networks

Definition 4.1 (Member). A member is a registered user of the social network.

Basically, members are similar to users of general social networks. They can log into the system, find and add acquaintances, upload and share information about themselves, write forum or blog entries, etc. The key difference between members and users of a general social network is that members can synchronize their phone-books with the social network. We denote the set of registered members by UM.

|UM| indicates the number of members in the network.

Definition 4.2 (Private contact). Private contacts correspond to phonebook en-tries of a member’s mobile phone. However, these private contacts are not shared between members, others cannot see them in the network. A private contact is saved in the system when a member synchronizes his or her phonebook with the social network. We denote the set of all private contacts by UP c.

UM and UP c are disjoint sets. Relationships between members are represented by the edge setEM M and relationships that a private contact belongs to a member are represented by the edge setEM P c.

Mobile related social networks allow synchronization between private phone-book contacts and the social network. This way the network is also able to discover relationships between members, if a similarity detecting algorithm discovers that a member in the network is similar to another members’ phonebook entry. This algorithm is able to compare two person entries (members and private contacts too) and determine whether they are likely similar. If so, it proposes a probability

value to this detected similarity. The details of the algorithm are discussed in the next subsection.

Figure 4.2 represents the graph structure when the similarity detecting algo-rithm detects matches.

Figure 4.2. Mobile related social networks with detected similarities

On Figure 4.2 the dotted edges between members and private contacts repre-sent detected similarities and broken lines between two private contacts illustrate possible duplications in the phonebooks. Duplications are detected as a positive side effect of the similarity detecting algorithm. Formally:

Definition 4.3 (Similarity). Similarities indicate that the similarity detecting al-gorithm found a member and a private contact similar, but the related member has not resolved it yet (accepted or ignored). The set ofES edges indicate detected similarities between private contacts and members.

Definition 4.4(Duplication). A duplication indicates that the similarity detecting algorithm found two private contact entries similar in a phonebook. The set ofED edges indicate detected duplications in phonebooks.

After similarities and duplications are detected there is a semi-automatic step:

the members -who have private contacts in their phonebook, which are detected as similar to other members- have to decide whether detected similarities are rel-evant ones. In addition to that, they can also resolve detected duplications in their phonebooks. Figure 4.3 represents the complete graph structure with all

Figure 4.3. Final structure of mobile related social networks

type of edges after some of the members have resolved detected similarities and duplications.

It can be noticed on the final structure that one of the private contacts of the most left member has been deleted because it was a relevant duplication in the phonebook and the owner member had found it relevant. The other on the right side still remained because that member has not decided about it yet. Besides that we can also see on Figure 4.3 that four from the five similarities were resolved (members found them relevant) and there is still one in the system (the member has not checked it yet).

Resolving a similarity means that an identity edge is being formed between the private contact(s) in one’s phonebook and the relevant member who represent the same person in the system. The private contacts that are linked to members via this type of customized links are called customized contacts.

Definition 4.5 (Identity). Identities indicate resolved similarities. An identity link shows that a member is in somebody’s phonebook and it was confirmed by the owner member. The set of ECcM of edges represent identities.

Definition 4.6 (Customized contact). A customized contact is created from a private contact when a member was found similar to this private contact and the owner member of the private contact accepted this similarity. The set of customized contacts is denoted by UCc.

This way there are two types of links between members and customized con-tacts. Following we define the term of owner member.

Definition 4.7 (Owner member). Owner members are related to private contacts or customized contacts when they are present in the phonebook of the owner member. Following we refer to this relationship with the M o index.

One of the key advantages of mobile related social networks are identities, because if a member changes his personal detail on the web user interface (adds a new phone number, uploads a new image, changes the website address, etc.) it will be automatically propagated to those phonebooks where there is a customized contact related to this member.

Proposition 4.8. The GM SN S graph with the following edge rules is able to de-scribe mobile related social networks without limitations.

GM SN S = (U, E), where U =UM ∪UP c∪UCc

E =EM M ∪EM P c∪ED ∪ES∪EM oCc∪ECcM

EM M ⊆ {(uM, u0M) :uM, u0M ∈UM, uM 6=u0M}

EM P c⊆ {(uM, uP c) :uM ∈UM, uP c ∈UP c, u0M ∈UM,

¬∃(u0M, uP c)∈EM P c}

ED ⊆ {(uP c, u0P c) :uP c, u0P c ∈UP c, uP c 6=u0P c, uM ∈UM,∃((uP c, uM), (u0P c, uM))∈EM P c}

ES ⊆ {(uP c, uM) :uM ∈UM, uP c ∈UP c,(uP c, uM)∈/ EM P c} ECcM ⊆ {(uCc, uM) :uCc ∈UCc, uM ∈UM,∃u0M ∈UM, u0M 6=uM, (uM, u0M)∈EM M,(uM, uCc)∈EM oCc}

EM oCc⊆ {(uM o, ucc) :uM o∈UM, uCc ∈UCc,∃u0M ∈UM, u0M 6=uM o, (uM o, u0M)∈EM M,(uM o, uCc)∈ECcM}

(4.1)

The number of standard relationships between members can be increased via a similarity detecting algorithm in this graph structure.

Proof. We can see that the defined three types of nodes can describe the different person roles in the network, but we have to go through the edge rules to examine whether they are able to describe the different type of relationships completely.

EM M edges describe standard relationships between members and the con-straint ensures that there are no loop edges. EM P c edges describe that a member

has a private contact entry in the phonebook and the constraint ensures that a private contact belongs only to one member.

ED edges describe possible duplications in phonebooks and the edge rule states that duplication links are allowed between private contacts only if they are in the same phonebook (belong to the same member). ES edges are allowed between members and private contacts, but the relevant private contacts should not be in the phonebook of the member.

When a member resolves a similarity, the related private contact will turn to a customized contact and the ES similarity edge will be upgraded to an ECcM edge (identity). It is visible that the resolution steps will cause also a relationship edge (EM M) between the two related members. This causes that the number of rela-tionships between members can be increased via an accurate similarity detecting algorithm.

Finally the set EM oCc of edges represent edges between owner members and their customized contacts, but these are less relevant, they have only technical role.

Following we use this model for mobile related social networks and related propositions.

Definition 4.9 (Fresh state). We consider that the network is in an ideal fresh state, when all of the relevant similarities and duplications are detected.

The definition of freshness describes how up-to-date is the network if we con-sider ES and ED edges. An ideal state for the network is when there are no such edges, because it indicates that members have resolved every similarity and dupli-cation. Following we apply a constraint that there are no unresolved duplications in the system. Phonebook entry duplications are usually resolved by the members shortly, thus this is not a real constraint.

Customized contacts and identities (ECcM) enable to keep phonebooks always up-to-date with information provided by the members. This way the number of ECcM edges has a great effect to the performance of the network. Following the operations are described, that have influence to the number of identities.

• Member registration (1): a member registers in the network and uploads her or his phonebook.

• Adding/modifying a private contact (2): a new private contact is added or modified by a member in her or his phonebook.

• Editing member profile (3): a member changes his profile, e.g.: sets the e-mail field.

Member registration brings a new member and an entirely new phonebook in the network. Private contact edit operation occurs quite often during the lifetime of the social network. Imagine that if it is possible to edit the phonebook easily from a web browser, then users will fill the fields of the contacts more likely. Editing the personal profile is also a simpler operation, it means that a member adds or modifies some of her or his personal detail.

Proposition 4.10. The following formulas describe two values to each operation:

the top number of identities which can appear after each operation and the number of required person-to-person compare steps to detect all new similarities, in case when there are no duplications in the system. Member registration is the most resource intense operation, therefore it can be used as an upper estimation for the other operations.

• Member registration (1)

Maximum brought identities (A): |UM|n+min((|UP c|n+1− |UP c|n),|UM|n), where n represents the nth time step and |UP c|n+1 − |UP c|n represents the number of private contacts in the phonebook of the new member.

Required comparison steps (B): |UP c|n+ (|UP c|n+1− |UP c|n)∗ |UM|n

• Adding/modifying a private contact (2) Maximum brought identities (A): 1 Required comparison steps (B): |UM|

• Editing member profile (3)

Maximum brought identities (A): |UM| −1

Required comparison steps (B): |UP c| −NM P c, where NM P c is the phonebook size of the member who updated the profile.

The following symbols are used:

|UP c|n: number of private contacts before the member registration

|UP c|n+1: number of private contacts after the member registration

|UM|n: number of members before the registration

|UP c|n+1− |UP c|n: number of private contacts in the new member’s phonebook.

Proof. (1 A) Firstly when a member registers in a mobile related social network, it is possible that she or he is already present in every phonebook of the members (e.g. if the registration is based on invitation). Secondly it is also possible that every private contact in the new members’ phonebook is already a member of the network. Therefore the top of the ECcM edges caused by a registration is:

|UM|n+min((|UP c|n+1− |UP c|n),|UM|n).

(1 B) Based on the previous train of thought the number of required compar-ison steps looks as follows. Firstly every previously existing private contact have to be examined whether they are likely similar to this new member, secondly the private contacts in the new member’s phonebook have to be com-pared to the members in the system. Formally: |UP c|n+(|UP c|n+1−|UP c|n)∗|UM|n. (2 A,B) If a phonebook contact is added or modified, then it should be compared to all of the members of the network and it is possible that one will match. This way the maximum number of identities caused by this operation is 1 and the required comparison steps depends on the current number of members (|UM|).

(3 A,B) When a member updates her or his profile and a relevant detail was added or changed, it is possible she or he is in the phonebook of the other members. Therefore the maximum number of identities caused by this operation is |UM| −1. The required comparison steps are|UP c| −NM P c, whereNM P c is the phonebook size of the member who updated her or his profile.

The upper bound to the total number of identities is relevant if we consider the scalability of a mobile related social network, since the amount of identity edges represent the number of updates in the phonebooks which are required to keep the network and the phonebooks up-to-date. The required comparison steps for each operation are relevant when designing a similarity detecting algorithm. It’s important that the algorithm has to find the similarities quickly and efficiently since it’s a key requirement of thefresh state. Next we describe a real environment when the model of mobile related social networks were applied and we propose a similarity detecting and handling algorithm and its efficiency via measurements.