Global Cooperation for Identity Separation

7 Characterizing Robustness of Identity Sep- Sep-aration for Grasshopper

7.2 Global Cooperation for Identity Separation

There are several ways of organizing cooperative identity separation. In our evaluation, we choose to select nodes for cooperation that are globally im-portant, where node importance can be measured from an adversarial point of view: nodes having a higher chance of re-identification should be selected first. Therefore, in our work we measure importance by adopting anonymity measures proposed in Section 6; namely LTA_A and LTA_deg. This addition-ally allows comparison of our results to [15], where analysis is provided with LTA_A based global cooperation on tackling Nar09.

In two separate series of experiments we ranked nodes accordingly to their LTA scores and created perturbed datasets where first nodes having the low-est anonymity rankings were selected for for identity separation. Our results are shown for each set of experiments for the LJ66k dataset on Fig. 8. Com-pared to the results of non-cooperative case, we can observe some progress, as even in the worst case (LTA_deg, basic model, Y = 2) at most |V_ids|= 0.26 proved to be enough for stopping the attack. Differences between results of LTA_A and LTA_deg are not outstanding, thus we can conclude that each method is feasible for tackling the Grasshopper attack by using this strategy

0.00 0.04 0.08 0.12 0.16 0.20 0.24 0.28

Ratio of nodes with identity separation (V_ids)

(a) Global cooperation based on LTA_A.

0.00 0.04 0.08 0.12 0.16 0.20 0.24 0.28

Ratio of nodes with identity separation (V_ids)

(b) Global cooperation based on LTA_deg.

Figure 8: Global cooperation can significantly decrease the minimum number of adopting users required for tackling Grasshopper.

(results of LTAA is only subtly better).

Compared to the measurements provided in [15] for Nar09, Grasshopper proved to be more robust in the case of the basic model. Regarding the LJ66k dataset, for tackling Nar09 with nodes using the basic model, Y = 2 the minimum number of participants was |V_ids| ∼ 0.15, and |V_ids| ∼0.10 for the best model, Y = 5. This also means that maintaining network privacy with targeting nodes having a low anonymity level can be still a working strategy, if cooperation can be organized.

8 Conclusion

In this paper, we provided the scheme of a novel structural re-identification attack called Grasshopper, and we experimentally compared it to the state-of-the-art attack, called Nar09. We have shown that in a number of cases when the attacker knowledge is rather noisy, Grasshopper can achieve significantly higher yield levels that was not possible with Nar09. It also turned out that

while Grasshopper also has parameter Θ for controlling the trade-off between yield and accuracy, our algorithm produces negligible error rates, typically around 1% and below. This is only a fraction that Nar09 produced under the same circumstances, meaning that matches proposed by Grasshopper are so low of error, all mappings could be accepted by an attacker without further filtering. Finally, we have shown that our algorithm can be initialized with the fraction of seed nodes compared to Nar09 to reach maximal re-identification. In all test networks 50 nodes selected from the top 25% by degree, or only 15 of top nodes proved to be sufficient, while these numbers were respectively around 750 and 85 for Nar09 (these measurements are provided in [14]). All these results prove that Grasshopper is a more suitable alternative when the goal is to have a low error in results, but also when the overlap between the sanitized and auxiliary datasets are low.

Achieving higher recall rates for noisy background knowledge is a sign of robustness. We also tested this by measuring how Grasshopper can defeat identity separation. It turned out that our algorithm is quite resistant to features of identity separation. When we simulated users creating two new identities (basic model), the algorithm was still capable of re-identifying a large fraction of users even when 90% of the users adopted the technique!

This is quite high, compared to the Nar09 algorithm, where we measured this around 60 −70% for the Epinions and Slashdot networks, and 80 − 90% in the LJ66k dataset in our previous work [15]. Even for a model with a higher number of identities and with edge deletion (best model, Y = 5) the required proportion of participant was around 80 − 90% to defeat Grasshopper. Fortunately, in this case, the attacker could learn only a little

about nodes adopting the technique, highlighting the applicability of this identity separation strategy for protecting user privacy.

Finally, we have evaluated two anonymity measures that was originally proposed and evaluated for the Nar09 attack [15], and showed that these are also useful for Grasshopper, as a high correlation was present in our experi-ments between the estimated level of anonymity and re-identification rates.

Based on these measures, we have tested two global cooperation schemes, where nodes having a lower anonymity level were selected for adopting iden-tity separation. While Grasshopper turned out to be more robust than Nar09 in this case (compared to results in [15]), it could not tackle users adopting the best model with Y = 5 with significantly lower participation rates. We can conclude that using the best model (Y = 5) is a feasible privacy enhancing strategy that minimize information disclosure also against the Grasshopper algorithm, and if network level cooperation can be organized, for protecting network privacy, too.

We have also pointed out several interesting research issues for future work. First of all, it would be important to refine the convergence criteria for stopping the algorithm. At the time of writing this paper, using a memory of past mapping seems to be the best candidate to count truly new mappings for convergence. Furthermore, it should be investigated why we observed differences in recall between Epinions and others networks. This could also help us in improving Grasshopper to achieve even higher recall rates when the overlap is higher.

References

[1] Pearson correlation. http://en.wikipedia.org/wiki/Pearson_

product-moment_correlation_coefficient. Accessed: 2014-04-22.

[2] Spearman’s rank correlation. http://en.wikipedia.org/wiki/

Spearman’s_rank_correlation_coefficient. Accessed: 2014-04-22.

[3] Stanford network analysis platform (snap). http://snap.stanford.

edu/. Accessed: 2014-04-22.

[4] What nsa’s prism means for social media users. http:

//www.techrepublic.com/blog/tech-decision-maker/

what-nsas-prism-means-for-social-media-users/. Accessed:

2014-05-26.

[5] F. Beato, M. Conti, and B. Preneel. Friend in the middle (fim): Tack-ling de-anonymization in social networks. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2013 IEEE Inter-national Conference on, pages 279–284, 2013.

[6] F. Beato, M. Kohlweiss, and K. Wouters. Scramble! your social network data. In S. Fischer-Hbner and N. Hopper, editors, Privacy Enhancing Technologies, volume 6794 ofLecture Notes in Computer Science, pages 211–225. Springer Berlin Heidelberg, 2011.

[7] S. Clauß, D. Kesdogan, and T. K¨olsch. Privacy enhancing identity man-agement: protection against re-identification and profiling. In

Proceed-ings of the 2005 workshop on Digital identity management, DIM ’05, pages 84–93, New York, NY, USA, 2005. ACM.

[8] L. A. Cutillo, R. Molva, and T. Strufe. Safebook: A privacy-preserving online social network leveraging on real-life trust.Communications Mag-azine, IEEE, 47(12):94–101, 2009.

[9] G. Danezis and C. Troncoso. You cannot hide for long: De-anonymization of real-world dynamic behaviour. In Proceedings of the 12th ACM Workshop on Workshop on Privacy in the Electronic Society, WPES ’13, pages 49–60, New York, NY, USA, 2013. ACM.

[10] P. Govindan, S. Soundarajan, and T. Eliassi-Rad. Finding the most appropriate auxiliary data for social graph deanonymization. 2014.

[11] G. G. Guly´as and S. Imre. Analysis of identity separation against a pas-sive clique-based de-anonymization attack. Infocommunications Jour-nal, 4(3):11–20, December 2011.

[12] G. G. Guly´as and S. Imre. Measuring local topological anonymity in social networks. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on, pages 563–570, 2012.

[13] G. G. Guly´as and S. Imre. Hiding information in social networks from de-anonymization attacks by using identity separation. In B. Decker, J. Dittmann, C. Kraetzer, and C. Vielhauer, editors, Communications and Multimedia Security, volume 8099 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013.

[14] G. G. Guly´as and S. Imre. Measuring importance of seeding for struc-tural de-anonymization attacks in social networks. In Pervasive Com-puting and Communications Workshops (PERCOM Workshops), 2014 IEEE International Conference on, 2014.

[15] G. G. Guly´as and S. Imre. Using identity separation against de-anonymization of social networks. Submitted to the Journal of Trans-actions on Data Privacy. Available at: http://gulyas.info/upload/

Gulyas_TDP2014.pdf, May 2014.

[16] G. G. Guly´as, R. Schulcz, and S. Imre. Modeling role-based privacy in social networking services. In Emerging Security Information, Sys-tems and Technologies, 2009. SECURWARE ’09. Third International Conference on, pages 173–178, June 2009.

[17] M. Hansen, P. Berlich, J. Camenisch, S. Clauß, A. Pfitzmann, and M. Waidner. Privacy-enhancing identity management. Information Se-curity Technical Report, 9(1):35–44, 2004.

[18] S. Ji, W. Li, J. He, M. Srivatsa, and R. Beyah. Poster: Optimization based data de-anonymization, 2014. Poster presented at the 35th IEEE Symposium on Security and Privacy, May 18–21, San Jose, USA.

[19] C. Y. Ma, D. K. Yau, N. K. Yip, and N. S. Rao. Privacy vulnerability of published anonymous mobility traces. InProceedings of the Sixteenth Annual International Conference on Mobile Computing and Networking, MobiCom ’10, pages 185–196, New York, NY, USA, 2010. ACM.

[20] A. Narayanan, E. Shi, and B. I. P. Rubinstein. Link prediction by de-anonymization: How we won the kaggle social network challenge. InThe 2011 International Joint Conference on Neural Networks, pages 1825–

1834, 2011.

[21] A. Narayanan and V. Shmatikov. De-anonymizing social networks. In Security and Privacy, 2009 30th IEEE Symposium on, pages 173–187, 2009.

[22] P. Pedarsani, D. R. Figueiredo, and M. Grossglauser. A bayesian method for matching two similar graphs without seeds. InCommunication, Con-trol, and Computing (Allerton), 2013 51st Annual Allerton Conference on, pages 1598–1607, Oct 2013.

[23] W. Peng, F. Li, X. Zou, and J. Wu. Seed and grow: An attack against anonymized social networks. In Sensor, Mesh and Ad Hoc Communica-tions and Networks (SECON), 2012 9th Annual IEEE CommunicaCommunica-tions Society Conference on, pages 587–595, 2012.

[24] H. Pham, C. Shahabi, and Y. Liu. Ebm: an entropy-based model to infer social strength from spatiotemporal data. In Proceedings of the 2013 international conference on Management of data, pages 265–276.

ACM, 2013.

[25] K. Sharad and G. Danezis. An automated social graph de-anonymization technique. In Proceedings of the 13th ACM Workshop on Workshop on Privacy in the Electronic Society, WPES ’14, New York, NY, USA, 2014.

ACM.

[26] E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. InProceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 678–684. ACM, 2005.

[27] M. Srivatsa and M. Hicks. Deanonymizing mobility traces: using social network as a side-channel. In Proceedings of the 2012 ACM conference on Computer and communications security, CCS ’12, pages 628–637, New York, NY, USA, 2012. ACM.

[28] B. van den Berg and R. Leenes. Audience segregation in social network sites. In Social Computing (SocialCom), 2010 IEEE Second Interna-tional Conference on, pages 1111–1116. IEEE, 2010.

[29] B. van den Berg and R. Leenes. Keeping up appearances: Audience seg-regation in social network sites. In S. Gutwirth, Y. Poullet, P. De Hert, and R. Leenes, editors, Computers, Privacy and Data Protection: an Element of Choice, pages 211–231. Springer Netherlands, 2011.

In document Analysis of Grasshopper, a Novel Social Network De-anonymization Algorithm (Pldal 33-41)