Large-scale Internet measurement
Laki, Sándor
Created by XMLmind XSL-FO Converter.
Large-scale Internet measurement
írta Laki, Sándor Publication date 2015
Szerzői jog © 2015 Laki Sándor
Tartalom
Large-scale Internet measurement ... 1
1. 1 Introduction to Internet measurements ... 1
1.1. Course Information ... 1
1.2. Grading ... 1
1.3. Term project ... 2
1.4. What is this course about? ... 2
1.5. Reading ... 3
1.6. INTRODUCTION ... 4
1.7. Once upon a time... ... 4
1.8. And no... ... 5
1.9. Another aspect of Internet evolution ... 6
1.10. Today’s Internet ... 8
1.11. Why do we need Internet measurements? ... 8
1.12. Why do we need Internet measurements? ... 9
1.13. What to measure? ... 9
1.14. Why is it challenging to measure the Internet? ... 9
1.15. Core simplicity ... 9
1.16. Layered architecture and hidden network elements ... 10
1.17. IP centric ... 10
1.18. Middleboxes in the carriers’ networks ... 10
1.19. Administrative boundaries ... 11
1.20. Applications ... 11
1.21. Network measurements ... 11
1.22. Infrastructure measurements ... 11
1.23. Traffic measurements ... 12
1.24. Application measurements ... 12
1.25. Active and passive measurements ... 12
1.26. Internet Measurements ... 13
1.27. Related Conferences and Journals ... 13
2. 2 Analytical background ... 13
2.1. Analytical background ... 13
2.2. LINEAR ALGEBRA ... 14
2.3. Notations ... 14
2.4. Norms and orthogonality ... 14
2.5. Matrices ... 14
2.6. Eigenvectors and eigenvalues ... 15
2.7. Alternate algebras ... 15
2.8. PROBABILITY AND STATISTICS ... 15
2.9. Why do we need statistics and probability theory? ... 15
2.10. Notations ... 16
2.11. Definitions ... 16
2.12. Definitions - II ... 16
2.13. Expected values and moments ... 16
2.14. Variance and standard deviation ... 16
2.15. Joint probability ... 16
2.16. Conditional probability ... 16
2.17. Central limit theorem ... 16
2.18. Distributions for Internet measurements ... 16
2.19. Stochastic processes ... 17
2.20. Stochastic processes ... 18
2.21. Stochastic processes ... 18
2.22. Characterization of a stochastic process ... 18
2.23. Simpler stationary conditions ... 18
2.24. Measures of dependence ... 18
2.25. Measures of dependence ... 18
2.26. Measures of dependence ... 19
iv
Created by XMLmind XSL-FO Converter.
2.27. Modeling network traffic and user activity ... 19
2.28. Modeling network traffic and user activity ... 19
2.29. Short and long tailed distributions ... 19
2.30. Short and long tailed distributions ... 19
2.31. Short and long tailed distributions ... 19
2.32. Heavy tailed/power-law distribution ... 19
2.33. Heavy tailed distribution ... 19
2.34. Measured data ... 20
2.35. Describing data ... 21
2.36. More detailed descriptions ... 21
2.37. Histogram ... 22
2.38. Empirical cumulative distribution function (CDF) ... 22
2.39. Categorical data description ... 23
2.40. Describing memory and stability ... 24
2.41. High variability in Internet data ... 24
2.42. Zipf’s law ... 25
2.43. GRAPH THEORY ... 25
2.44. Graph theory ... 25
2.45. Graphs ... 26
2.46. Subgraphs ... 26
2.47. Connected graphs ... 26
2.48. Metrics for characterization ... 26
2.49. Metrics for characterization ... 26
2.50. Matrix representation ... 26
2.51. Applications of Routing Matrix ... 27
2.52. Applications of routing matrix ... 27
2.53. Artificial graph constructions ... 28
2.54. Erdős-Rényi random graph ... 28
2.55. Erdős-Rényi random graph ... 28
2.56. Generalized random graph ... 29
2.57. Preferential attachment model ... 29
2.58. Preferential attachment model ... 29
2.59. Regular vs Random graphs ... 29
2.60. AS level topology ... 29
2.61. AS level topology ... 30
2.62. AS level topology ... 30
2.63. MODELING ... 31
2.64. Measurement and modeling ... 31
2.65. Descriptive data model ... 31
2.66. Constructive data model ... 32
2.67. Data model ... 32
2.68. Why build models ... 32
2.69. Probability models ... 32
3. 3 Network measurement infrastructures ETOMIC and SONoMA ... 33
3.1. Why Internet experimental facilities are needed? ... 33
3.2. Existing TestBeds and Network Measurement Infrastructures ... 33
3.3. Lifecycle of network measurements ... 34
3.4. ETOMIC ... 34
3.5. The ETOMIC system ... 34
3.6. System architecture ... 35
3.7. Evolution of measurement nodes ... 36
3.8. ETOMs ... 36
3.9. APE boxes ... 37
3.10. Measurement boxes ... 37
3.11. Central Management System ... 37
3.12. Slices VS Unique timeslots ... 38
3.13. The ETOMIC system ... 39
3.14. One day on the Internet ... 43
3.15. Experimental use cases in ETOMIC ... 43
3.16. HOW TO USE ETOMIC? ... 45
3.17. Performing an experiment from the system’s perspective ... 46
3.18. Measurement types ... 46
3.19. Necessary steps for submitting an experiment ... 48
3.20. Creating a bundle ... 49
3.21. Creating an experiment and querying its status ... 49
3.22. Downloading the results ... 49
3.23. Programming DAG cards ... 49
3.24. PUBLISHING DATA ... 50
3.25. Experimental facilities ... 50
3.26. Traditional approach ... 50
3.27. Sharing science ... 51
3.28. Related work: CAIDA/DatCat ... 52
3.29. Related work: MoMe database ... 52
3.30. Related work: MAWI repository ... 53
3.31. Data publication efforts ... 53
3.32. Key ideas in data handling ... 54
3.33. VO approach ... 54
3.34. Unified interface ... 54
3.35. Casjobs User Interface for accessing data ... 54
3.36. SONOMA ... 55
3.37. SONoMA v1.0 ... 55
3.38. Why do we need another network measurement platform? ... 55
3.39. SONoMA ... 56
3.40. System components ... 56
3.41. Management Layer ... 57
3.42. Measurement methods ... 57
3.43. Web client ... 57
3.44. Case study: A full mesh topology measurement ... 58
3.45. Case study: A full mesh topology measurement ... 59
3.46. What happens in the background? A full mesh topology measurement ... 59
3.47. Another use case: Spotter ... 60
3.48. SONoMA 2.0 ... 60
3.49. Literature ... 61
4. 4 Network measurement infrastructures PlanetLab ... 61
4.1. PlanetLab ... 61
4.2. The main goal ... 61
4.3. What is PlanetLab? ... 62
4.4. PlanetLab architecture ... 62
4.5. Slices ... 62
4.6. Slices ... 62
4.7. Slices ... 62
4.8. User Opt-in ... 62
4.9. Services running in your slice ... 62
4.10. Services running in your slice ... 63
4.11. Services running in your slice ... 64
4.12. Services running in your slice ... 65
4.13. Services running in your slice ... 66
4.14. Virtualization solutions ... 67
4.15. VServers in a PlanetLab node ... 68
4.16. VServers in a PlanetLab node ... 68
4.17. Low-level network access ... 69
4.18. Getting started ... 69
4.19. Create your SSH Key ... 69
4.20. Create your slice ... 70
4.21. Login to your slice ... 70
4.22. Install additional packages ... 70
4.23. Deploying your app ... 71
4.24. Configuring a server for automatic startup ... 71
4.25. Other useful tools ... 71
4.26. PSSH ... 71
vi
Created by XMLmind XSL-FO Converter.
4.27. PSSH Demo ... 72
4.28. PlanetLab Slice Deploy Toolkit ... 72
4.29. vxargs ... 72
4.30. Nixes Tool Set ... 72
4.31. Long-Running Services In PlanetLab ... 73
4.32. Services (cont) ... 73
4.33. Services (cont) ... 74
4.34. Further available testbeds with PlanetLab Europe account ... 74
4.35. NITOS Wireless Testbed ... 74
4.36. w-iLab.t ... 75
5. 5 Network measurement infrastructures FEDERICA, SFA, OpenFlow ... 75
5.1. Federica ... 76
5.2. Federica ... 76
5.3. The physical topology ... 76
5.4. Core elements ... 77
5.5. SFA – SLICE-BASED FACILITY ARCHITECTURE ... 77
5.6. Slice-based Facility Architecture SFA ... 77
5.7. Slice-based Facility Architecture SFA ... 78
5.8. Experiment lifetime in general ... 78
5.9. What can SFA help with? ... 78
5.10. SFA for federated testbeds ... 79
5.11. SFA for federated testbeds ... 81
5.12. SFA – Available resources ... 82
5.13. SFA functionalities ... 83
5.14. Hierarchical naming ... 83
5.15. Authentication ... 85
5.16. SFA API ... 86
5.17. SFA Components ... 86
5.18. Resource Specification (RSpec) Documents ... 87
5.19. SFI and SFA client ... 88
5.20. Installation and configuration ... 88
5.21. List records from the registry ... 89
5.22. Detailed record information ... 89
5.23. Get resources ... 90
5.24. Get resources ... 90
5.25. Allocate resources for a given slice ... 91
5.26. Allocate resources for a given slice ... 91
5.27. Deallocate resources ... 92
5.28. OPENFLOW CAPABILITIES IN PLANETLAB EUROPE ... 92
5.29. What is the problem with existing networks? ... 92
5.30. What is the problem with existing networks? ... 92
5.31. Software Defined Networking ... 93
5.32. OpenFlow ... 93
5.33. OpenFlow ... 94
5.34. Plumbing primitives ... 95
5.35. Network OSes ... 95
5.36. OpenFlow support in PlanetLab ... 95
5.37. How to use it in PlanetLab? ... 96
5.38. How to use it in PlanetLab? ... 96
5.39. Create the topology ... 97
5.40. Create the topology ... 97
5.41. Modify the topology ... 98
5.42. Literature ... 99
6. 6 Bandwidth measurement methods Network path characterization ... 99
6.1. Methods to measure path characteristics ... 99
6.2. Capacity ... 101
6.3. Available bandwidth ... 101
6.4. Capacity and Available Bandwidth ... 101
6.5. Passive Techniques ... 102
6.6. Active probing methods ... 102
6.7. Basic ideas ... 102
6.8. State of the art Bandwidth estimation methods ... 103
6.9. SLoPS Self-Loading Periodic Streams ... 104
6.10. SLoPS Self-Loading Periodic Streams ... 105
6.11. SLoPS ... 105
6.12. SLoPS ... 105
6.13. OWD variations ... 106
6.14. How it works? ... 107
6.15. How to determine parameters K,L and T? ... 107
6.16. Fleets of streams ... 107
6.17. How to detect the increasing trend of OWDs? ... 107
6.18. PathLoad uses two metric to recognize increasing trend ... 107
6.19. PDT and PCT examples ... 108
6.20. PCT variations examples ... 108
6.21. PDT variations example ... 109
6.22. Rate adjustment ... 110
6.23. Performance ... 110
6.24. Packet Pair-based methods ... 111
6.25. PathChirp Chirp Packet Trains ... 111
6.26. PathChirp ... 112
6.27. PathChirp Methodology ... 112
6.28. Self-Induced Congestion ... 113
6.29. Excursions ... 114
6.30. pathChirp Tool ... 115
6.31. Comparison with Pathload ... 115
6.32. PathSensor: Granular model-based bandwidth estimation ... 116
6.33. Estimating output spacing with fluid traffic for a single-hop scenario ... 117
6.34. Fluid curves for single-hop ... 117
6.35. How to simulate cross traffic? ... 118
6.36. Output spacing ... 119
6.37. Output spacing ... 119
6.38. Explicit solution for M/D/1 queues ... 119
6.39. Explicit solution for M/D/1 queues ... 119
6.40. Literature ... 121
7. 7 Topology discovery in large-scale networks ... 122
7.1. Topology discovery ... 122
7.2. Challenges ... 122
7.3. Naiv approaches ... 123
7.4. CAIDA’s Skitter ... 123
7.5. NetDimes ... 124
7.6. Expectations ... 125
7.7. Different methods ... 125
7.8. ROUTE DISCOVERY ... 126
7.9. Traceroute ... 126
7.10. How traceroute works? ... 126
7.11. How traceroute works? ... 127
7.12. How traceroute works? ... 127
7.13. How traceroute works? ... 127
7.14. Problems ... 128
7.15. Problems with load balancers ... 128
7.16. Problems with load balancers ... 129
7.17. What causes this anomaly? ... 129
7.18. A more complex example ... 130
7.19. What can we do? ... 130
7.20. Paris Traceroute Algorithm ... 130
7.21. Finding the NEXTHOP ... 131
7.22. The key ideas behind NEXTHOP ... 132
7.23. Number of probes and the expected number of interfaces at 95 percent confidence level 132 7.24. SELECTFLOW: Selecting a flow ... 133
viii
Created by XMLmind XSL-FO Converter.
7.25. SELECTFLOW: discovering new flows crossing router r ... 133
7.26. PERPACKET ... 133
7.27. Discovering nexthop interfaces in presence of a load balancer ... 133
7.28. Discovering nexthop interfaces in presence of a load balancer ... 134
7.29. Performance of Paris traceroute ... 135
7.30. Load balancers ... 136
7.31. TOPOLOGY DISCOVERY ... 137
7.32. Topology discovery ... 137
7.33. DoubleTree ... 137
7.34. The actual topology ... 138
7.35. Intra-monitor redundancy ... 138
7.36. Inter-monitor redundancy ... 138
7.37. Tree like structures ... 138
7.38. Monitor rooted tree ... 139
7.39. Destination rooted tree ... 139
7.40. DoubleTree ... 139
7.41. Maintaining trees ... 140
7.42. DoubleTree results ... 140
7.43. Literature ... 140
8. 8 Network tomography ... 140
8.1. What does tomography mean? ... 140
8.2. Network tomography? ... 141
8.3. Network tomography? ... 142
8.4. How does it work? ... 142
8.5. How does it work? ... 143
8.6. Network Tomography ... 144
8.7. Network Tomography ... 145
8.8. Network Tomography ... 146
8.9. Network Tomography ... 147
8.10. What else is needed? ... 148
8.11. Estimating Source-destination traffic intensities ... 148
8.12. Estimating Source-Destination traffic intensities ... 148
8.13. A toy example ... 148
8.14. EM algorithm ... 149
8.15. MLE and Normal Approximations ... 149
8.16. MultiCast-based loss inference ... 149
8.17. Loss model ... 150
8.18. Loss inference ... 151
8.19. Solution with EM ... 151
8.20. Convergence ... 151
8.21. Convergence ... 152
8.22. Unicast network tomography ... 153
8.23. Sandwich probing ... 154
8.24. Sandwich probing ... 154
8.25. Measurement framework ... 155
8.26. Topology Identification ... 155
8.27. Simplifying the problem ... 155
8.28. Find the tree ... 155
8.29. Illustration ... 156
8.30. Literature ... 157
9. 9 Network coordinates systems ... 157
9.1. Introduction ... 157
9.2. The key idea of an NCS ... 158
9.3. Localization Techniques ... 158
9.4. Localization Techniques ... 158
9.5. Localization Techniques ... 158
9.6. Network Coordinates System Basics ... 158
9.7. Network Coordinates System Basics ... 159
9.8. Network Coordinates Systems Advantages ... 160
9.9. LANDMARK BASED NCS ... 160
9.10. IDMaps ... 160
9.11. Landmark based NCSs ... 161
9.12. Global Network Positioning ... 161
9.13. Lighthouses ... 162
9.14. Lighthouses ... 163
9.15. Network Positioning System ... 164
9.16. Internet Coordinate System ... 165
9.17. Internet Coordinate System ... 165
9.18. Virtual landmarks ... 165
9.19. Internet Distance Estimation Service ... 166
9.20. DISTRIBUTED NCS ... 167
9.21. Distributed NCSs ... 167
9.22. Practical Internet Coordinates ... 167
9.23. Big-Bang Simulation ... 167
9.24. Big-Bang Simulation ... 168
9.25. Vivaldi ... 169
9.26. Vivaldi ... 170
9.27. Vivaldi ... 170
9.28. Vivaldi – Centralized algorithm ... 170
9.29. Vivaldi – Centralized algorithm ... 171
9.30. Distributed Vivaldi with constant timesteps ... 171
9.31. Vivaldi – Adaptive timesteps ... 171
9.32. Decentralized Vivaldi with adaptive timestep ... 171
9.33. Latency data for performance analysis ... 171
9.34. Timestep choice ... 171
9.35. Convergence and robustness against high-error nodes ... 171
9.36. Communication patterns ... 171
9.37. Triangle Inequality Violations ... 172
9.38. Euclidean spaces ... 173
9.39. Spherical coordinates ... 173
9.40. Height model ... 174
9.41. Height model ... 174
9.42. Pharos - Hierarchical Vivaldi ... 174
9.43. Pharos – The algorithm ... 175
9.44. Hierarchical distance prediction ... 175
9.45. A two-tier ICS ... 175
9.46. Triangular inequality violation ... 176
9.47. Triangular inequality violation ... 176
9.48. Two-tier Vivaldi ... 176
9.49. Two-tier Vivaldi ... 177
9.50. Limitations ... 177
9.51. Benefits ... 177
9.52. Comparison of different techniques ... 178
9.53. Security in NCS ... 178
9.54. Security in NCS ... 178
9.55. Security in NCS ... 179
9.56. Future directions ... 179
9.57. Literature ... 180
10. 10 IP geolocation ... 180
10.1. Motivation ... 180
10.2. IP Geolocation in general ... 181
10.3. Whois based location estimation example for passive geolocation ... 183
10.4. IP Geolocation in general ... 184
10.5. IP Geolocation in general ... 184
10.6. THE FIRST STEPS ... 188
10.7. IP2Geo – Single point localization ... 188
10.8. GeoTrack – main idea ... 188
10.9. GeoTrack ... 189
10.10. GeoPing - Delay based localization ... 189
10.11. GeoPing - details ... 189
x
Created by XMLmind XSL-FO Converter.
10.12. GeoCluster ... 190
10.13. GeoCluster ... 190
10.14. GeoCluster – Clustering IP addresses ... 190
10.15. Performance of GeoCluster ... 190
10.16. ADVANCED TECHNIQUES ... 191
10.17. Constraint Based Geolocation ... 191
10.18. Constraint Based Geolocation ... 192
10.19. Octant IP geolocation framework ... 192
10.20. It is more than a simple method, it is a framework ... 193
10.21. Notations ... 193
10.22. Octant – Landmarks and constraints ... 194
10.23. Estimated location ... 194
10.24. Mapping latencies to distances ... 194
10.25. Mapping latencies to distances ... 194
10.26. Mapping latencies to distances ... 195
10.27. Last hop delays ... 195
10.28. Eliminating last hop delays in Octant ... 195
10.29. Last hop delays in Octant ... 196
10.30. Last hop delays ... 196
10.31. Results ... 196
10.32. Results ... 196
10.33. Spotter – a probabilistic approach ... 197
10.34. Travel time – distance relation ... 197
10.35. Travel time – distance relation ... 198
10.36. Statistical delay-distance model ... 199
10.37. Statistical delay-distance model ... 199
10.38. Statistical delay-distance model ... 200
10.39. Evaluation – "Probabilistic triangulation" ... 200
10.40. Performance analysis ... 201
10.41. Topology-based Geolocation ... 202
10.42. Topology based geolocation ... 202
10.43. Summary of techniques ... 203
10.44. Estimate hop latencies ... 203
10.45. Estimate hop latencies ... 203
10.46. Clustering interfaces ... 204
10.47. Clustering interfaces ... 204
10.48. Validating location hints ... 205
10.49. Constraint optimization ... 205
10.50. Constraint optimization ... 205
10.51. Results ... 205
10.52. Results ... 206
10.53. Other issues to be handled Indirect routes ... 206
10.54. Other issues to be handled Indirect routes discovery ... 206
10.55. Other issues to be handled Handling uncertainty ... 207
10.56. Other issues to be handled Iterative refinement ... 207
10.57. Literature ... 207
11. 11 Geography of the Internet On the spatial properties of network topology ... 208
11.1. Network research ... 208
11.2. The distance is what really counts. ... 210
11.3. Data collection ... 210
11.4. Data collection ... 212
11.5. Covered areas ... 214
11.6. Histogram maps ... 214
11.7. Transforming spatial distributions ... 216
11.8. Transforming spatial distributions ... 217
11.9. Transforming spatial distributions ... 217
11.10. Transforming spatial distributions ... 218
11.11. A router-likelihood map ... 219
11.12. Likelyhood of router positions - US ... 219
11.13. Likelyhood of router positions - US ... 220
11.14. Characterizing the link length ... 221
11.15. Characterizing the network links ... 222
11.16. Frequency of link lengths ... 222
11.17. Frequency of link lengths ... 222
11.18. Frequency of link lengths ... 223
11.19. Frequency of link lengths ... 223
11.20. Frequency of link lengths ... 224
11.21. Frequency of link lengths ... 225
11.22. Frequency of link lengths ... 226
11.23. Frequency of link lengths ... 227
11.24. Distribution of link lengths ... 229
11.25. Distribution of link lengths ... 229
11.26. Distribution of link lengths ... 229
11.27. Distribution of link lengths ... 230
11.28. Distribution of link lengths ... 230
11.29. The embedded topology ... 231
11.30. Characterizing network paths ... 232
11.31. Aggregated path length ... 232
11.32. Circuitousness ... 233
11.33. Symmetry ... 234
11.34. Symmetry ... 234
11.35. Direction dependence of lateral deviations ... 235
11.36. Unfamiliar routing phenomenon? ... 236
11.37. Literature ... 236
12. 12 Network traffic analysis, clustering and classification ... 237
12.1. Traffic ... 237
12.2. Traffic classification ... 237
12.3. Traffic classification ... 238
12.4. Quality of Service (QoS) ... 239
12.5. Traffic Classification ... 239
12.6. Traffic Classification ... 239
12.7. Different approaches ... 240
12.8. Deep Packet Inspection ... 240
12.9. Deep Packet Inspection Basics ... 240
12.10. Multi-byte pattern matching ... 241
12.11. Deploying multiple multi-byte DFAs ... 242
12.12. True positive VS False positive etc. ... 242
12.13. Performance of different DPI tools ... 243
12.14. Classical recipe for flow statistic-based traffic classification ... 243
12.15. Statistical payload analysis ... 244
12.16. KISS: Stochastic Packet Inspection ... 244
12.17. Chi square statistics ... 244
12.18. Decision process ... 245
12.19. Validation on a real traffic trace ... 245
12.20. Early Identification of Peer-To-Peer Traffic ... 245
12.21. Modeling a flow ... 245
12.22. Classification via probabilistic models ... 246
12.23. Data Collection for ground truth ... 247
12.24. Experiments ... 248
12.25. Feasibility test ... 248
12.26. Feasibility test ... 249
12.27. How much data is needed? ... 250
12.28. How much data is needed? ... 251
12.29. How much data is needed? ... 251
12.30. Robustness ... 252
12.31. Training set sizes ... 252
12.32. Is it protocol independent? ... 253
12.33. Robustness Asymmetric routing ... 254
12.34. Robustness Unknown traffic ... 254
12.35. Real traffic traces ... 255
xii
Created by XMLmind XSL-FO Converter.
12.36. Confusion matrix ... 255
12.37. Literature ... 256
13. 13 Measurements in peer-to-peer networks ... 256
13.1. Centralized VS P2P ... 256
13.2. Peer-to-peer networks ... 257
13.3. Peer-to-peer networks ... 258
13.4. Some P2P protocols ... 258
13.5. What do we want to measure? ... 259
13.6. How can we do that? ... 259
13.7. Gnutella ... 260
13.8. Gnutella vs Napster ... 260
13.9. Gnutella vs Napster Lifetime of the peers ... 260
13.10. Gnutella vs Napster Shared files vs Shared data ... 261
13.11. Gnutella Latencies and downstream bandwidth ... 262
13.12. Kademlia ... 263
13.13. Kademlia ... 263
13.14. Kademlia ... 263
13.15. Kademlia ... 264
13.16. Kademlia Collected data ... 264
13.17. Kademlia Peers with dynamic IPs ... 264
13.18. Kademlia Peer availability ... 265
13.19. IP-based availability is similar to what we have seen for Gnutella ... 266
13.20. How duration can affect the availability? ... 267
13.21. Time of day effects ... 267
13.22. BitTorrent ... 268
13.23. File Sharing ... 269
13.24. *.torrent ... 269
13.25. The Tracker ... 270
13.26. BitTorrent ... 270
13.27. An example ... 271
13.28. File sharing ... 271
13.29. Lifetime of a torrent Seeders and leechers ... 271
13.30. Pieces and sub-pieces ... 271
13.31. Piece Selection ... 272
13.32. Piece Selection ... 272
13.33. Choking ... 272
13.34. Choking algorithm ... 273
13.35. Optimistic unchoke ... 273
13.36. Upload only mode ... 274
13.37. Lifetime of a torrent ... 274
13.38. Peer behaviour ... 275
13.39. Peer behaviour ... 276
13.40. Literature ... 277
14. 14 Analysis of online social networks ... 277
14.1. Be socialized ... 277
14.2. Increasing interest ... 278
14.3. Twitter users ... 279
14.4. Social Flow ... 279
14.5. Twitter ... 280
14.6. Why do we analyze it? ... 280
14.7. Is the data available? ... 280
14.8. Quantifying influence ... 281
14.9. How to measure influence? ... 281
14.10. Cascades ... 282
14.11. Cascade sizes and depths ... 283
14.12. How to predict influence? ... 284
14.13. Regression tree for influence prediction ... 284
14.14. Past influences vs followers ... 284
14.15. Information flow on twitter ... 286
14.16. How to identify Elite users? ... 286
14.17. Snowball sample of Twitter lists ... 286
14.18. Activity sample of Twitter users ... 287
14.19. Who listens to whom? ... 288
14.20. Who listens to whom? ... 288
14.21. Two step information flow ... 289
14.22. Who are the intermediaries? ... 290
14.23. Who are the intermediaries? ... 291
14.24. Network Dynamics ... 293
14.25. Memetracking ... 293
14.26. Memetracking ... 293
14.27. Collective attention on Twitter ... 294
14.28. Collective attention on Twitter ... 295
14.29. Collective attention onTwitter ... 295
14.30. Collection attention on Twitter ... 296
14.31. Literature ... 297
15. 15 Measurements in mobile and cellular networks ... 297
15.1. Internet and cellular networks ... 298
15.2. What can measurements reveal? ... 298
15.3. A widely heterogeneous environment ... 299
15.4. How do the different access technologies affect the performance? ... 299
15.5. HSDPA downlink ... 299
15.6. HSDPA Uplink ... 300
15.7. LTE Downlink ... 301
15.8. LTE Uplink ... 302
15.9. What is the problem with the first 10 seconds? ... 302
15.10. Large scale measurements ... 303
15.11. Performance of different access technologies ... 304
15.12. Performance of different access technologies ... 304
15.13. Performance of different access technologies ... 305
15.14. Performance of different access technologies ... 306
15.15. Performance of different access technologies ... 307
15.16. Performance of different access technologies ... 308
15.17. Performance of different access technologies ... 309
15.18. Long-term trend and daily patterns ... 310
15.19. Long-term trend and daily patterns ... 311
15.20. Long-term trend and daily patterns ... 311
15.21. Measuring DNS lookup time in 3G networks ... 312
15.22. Measuring DNS lookup time in 3G networks ... 313
15.23. Downlink throughput of major carriers in the U.S. ... 313
15.24. Cellular network policies ... 314
15.25. Port scans for large carriers in the U.S. ... 314
15.26. Port scans for large carriers in the U.S. ... 315
15.27. Port scans for large carriers in the U.S. ... 316
15.28. FTP blocking in T-Mobile’s network ... 317
15.29. HTTP proxy port blocking in T-Mobile’s network ... 317
15.30. How can these middleboxes affect user experience? ... 318
15.31. IP spoofing ... 318
15.32. IP spoofing measurement ... 319
15.33. Short TCP connection timeout ... 320
15.34. How short TCP connection timeouts affect energy consumption? ... 320
15.35. Packet reordering in the middleboxes ... 321
15.36. Packet reordering in the middleboxes ... 322
15.37. Packet reordering in the middleboxes ... 322
15.38. NAT traversal ... 323
15.39. NAT mapping in cellular networks ... 324
15.40. Mobile network measurement projects ... 324
15.41. Literature ... 325
Large-scale Internet measurement
1. 1 Introduction to Internet measurements
1.1. Course Information
• Instructor: Sándor Laki
• E-mail: lakis@inf.elte.hu
• Office hours: Thursday, 10:00-12:00, DT. 2.506
• Lecture: T/Th, 10:00 – 12:00
• Location: DT. 2.516
• Web site: http://lakis.web.elte.hu/EIT/lsim-2012autumn
• Mailing list: lsim-2012autumn@googlegroup.com
1.2. Grading
• Prerequisites
• Graduate level Computer Networking courses
• http://lakis.web.elte.hu/comnet-eng-bsc/
• http://people.inf.elte.hu/lukovszki/Courses/1314BSC/
• Credits: 6 ETCS
• Grading
• Midterm: 20%
• Good participation 10%
• Term project: 50%
• Specification: 10%
• Work in progress report and midterm. presentation: 15%
• Final report and presentation: 25%
• Final exam: 20%
2
Created by XMLmind XSL-FO Converter.
• Photo by Sage Ross
1.3. Term project
• Do as a team of 2+ students
• Decide what to measure and specify how to do that
• Build measurement tools or use existing platforms
• Perform measurements
• Collect and analyze measurement data
• Identify potential applications or further research directions
1.4. What is this course about?
• Not an introduction to the Internet
• Focus on Internet Measurements
• What to measure?
• traffic, infrastructure
• applications, performance
• Why is it important?
• Traffic engineering, capacity planning, topology mapping
• How does Internet look like?
• Application and traffic characteristics, Topology/route choice
• How to measure and how to interpret measurement results?
• Measurement methodologies and challenges
• Design trade-offs
• Design of measurement/monitoring systems
• Tools: data collection, modeling, statistical inference, etc.
• Image courtesy of Michal Marcol / FreeDigitalPhotos.net
1.5. Reading
• Mark Crovella, Balachander Krishnamurthy:
• Internet Measurement: Infrastructure, Traffic and Applications.
• Wiley, 2006.
• Raj Jain:
• The Art of Computer Systems Performance Analysis.
• Wiley and Sons, New York, 1991
• Kurose and Ross:
• Computer Networking: A Top-Down Approach Featuring the Internet.
• Fifth edition, Addison-Wesley, 2009
4
Created by XMLmind XSL-FO Converter.
1.6. INTRODUCTION
1.7. Once upon a time...
1.8. And no...
6
Created by XMLmind XSL-FO Converter.
• Source: wikipedia.org
1.9. Another aspect of Internet evolution
• Once
• and now…
8
Created by XMLmind XSL-FO Converter.
1.10. Today’s Internet
• Tier I (Core network)
• Tier II providers
• Customer IP networks
• ISP-1
• ISP-2
• ISP-3
• IXP
• IXP
• IXP
• Hyper Giants
• Google, Akamai, CDN, etc.
• National and Global Transit Backbones
1.11. Why do we need Internet measurements?
• Internet seems to work well
• Despite the exponential growth in its size
• Despite the high variety of applications
• Email, Web, Instant messaging, File sharing, Social networks, Games, etc.
• Why do we bother measuring various aspects of it then?
1.12. Why do we need Internet measurements?
• Internet is far from being ideal
• Internet measurements help us
• To better understand why the Internet works and how it does
• To design new features that may lead to the next generation Internet
• To identify the weaknesses of network protocols
1.13. What to measure?
• Physical Properties
• Network devices
• routers, NAT boxes, firewalls, switches
• Links
• wired, wireless
• Topology Properties
• Various levels
• Autonomous Systems (AS), Points of Presence (PoP), Routers, Interfaces
• Traffic Properties
• Delays
• Transmission, Propagation, Queuing, Processing etc.
• Losses, Throughput, Jitter, etc.
1.14. Why is it challenging to measure the Internet?
• Poor observability of network characteristics
• The reasons behind
• Core simplicity
• Layered architecture and hidden elements
• Administrative boundaries
1.15. Core simplicity
• The network built up from very simple elements
• Keep it simple design concept
• Stateless nature
10
Created by XMLmind XSL-FO Converter.
• Generally end-to-end arguments
• Packets are not tracked
• The interaction with the network is hard to observe
1.16. Layered architecture and hidden network elements
• Hourglass model hides the details of the lower layers
• IP everywhere, few transport protocols
• It is almost impossible to measure the layers below IP
• HTTP, Email, FTP, DNS, RTP, SMTP, WWW, VoIP, BitTorrent,…
• TCP, UDP,...
• IP
• Ethernet, CSMA
• MPLS, PPP, sonet, ...
• WiFi, WiMax, LTE, UMTS, copper, fiber, ...
1.17. IP centric
• You must be familiar with IPv4 and IPv6
• IPv4 header:
1.18. Middleboxes in the carriers’ networks
• Hidden elements
• Firewalls
• Filter out traffic, block ports, etc.
• Proxies and IP sniffers
• Improve performance
• Traffic shapers
• Improve traffic management
• NAT boxes
• Utilize IP space more efficiently
• Active network measurements have to take into consideration the presence of hidden middleboxes
• Probe traffic may be blocked
• Traffic shapers may affect probe traffic
• NATs hide the internal structure and size of the network
1.19. Administrative boundaries
• System of systems
• Interconnected networks operated by different organizations
• ISPs hide the details of their network
• E.g. instead of router level topologies only PoP level ones are available
• Inter-AS routing is based on business decisions
• Economical and political aspects
1.20. Applications
• Traffic engineering
• Troubleshooting
• Anomaly detection
• Security forensics
• Feasibility check of new ideas
1.21. Network measurements
• Infrastructure
• Traffic
• Application
1.22. Infrastructure measurements
12
Created by XMLmind XSL-FO Converter.
• Basic path characteristics
• Loss, delay, jitter, bandwidth, etc.
• Topology measurements
• Network tomography
• Network coordinate systems
• IP geolocation
• Wireless mesh networks
1.23. Traffic measurements
• Packet traces
• Sampling
• Flow characteristics
• Inter arrival times, packet size distribution, etc.
• Traffic Matrix Estimations
• Deep Packet Inspection
• Statistical Traffic Classification
• Statistical Payload Analysis
1.24. Application measurements
• Content Delivery Networks
• Web content clustering
• Skype and other VOIP measurements
• File sharing
• Video On Demand, IPTV
• Malware
• Social networks
1.25. Active and passive measurements
• Active measurements
• Methods that inject probe traffic into the network for the purpose of the measurement, and examine how the network affect the properties of the probe traffic
• Typically end-to-end
• Some tools: ping, owamp, traceroute, iperf, etc.
• Passive measurements
• Methods that capture traffic generated by other users and applications to calculate network related metrics
• Examples
• Routeview repositories stores BGP tables from a large set of Ases
• Traffic trace captured by pcap at a given point of the network
• Flow (byte) counters in routers
1.26. Internet Measurements
• Internet Measurement is key to designing the next generation communication network
• Fundamental design principles of the current internet make it harder for measuring various aspects of it
• Preliminary research has resulted in a set of basic tools and methods to measure aspects like topology, traffic etc.
• Accuracy of such methods is still an open question
• There is still a lot of ground to cover in this direction and this is where researchers like you come into the equation
1.27. Related Conferences and Journals
• Conferences
• Internet Measurement Conference
• Passive and Active Measurement Workshop
• ACM SIGMETRICS
• Network and Distributed System Security Symposium
• ACM SIGCOMM
• IEEE INFOCOM
• Journals
• Computer Networks (ComNet)
• IEEE Transactions on Networking (ToN)
• IEEE Journal on Selected Areas in Communication (JSAC)
2. 2 Analytical background
2.1. Analytical background
We need tools for study Internet in a quantitative fashion
14
Created by XMLmind XSL-FO Converter.
• Linear algebra
• Probability and statistics
• Graph theory
Further readings in these topics:
• 1.Linear algebra wikibook: http://en.wikibooks.org/wiki/Linear_Algebra
• 2. Mario F. Triola: Elementary Statistics
• 3. Reinhard Diestel: Graph Theory http://diestel-graph-theory.com/index.html
2.2. LINEAR ALGEBRA 2.3. Notations
2.4. Norms and orthogonality
2.5. Matrices
2.6. Eigenvectors and eigenvalues 2.7. Alternate algebras
2.8. PROBABILITY AND STATISTICS
2.9. Why do we need statistics and probability theory?
• Most of the mechanisms in networks are not deterministic
• Randomized algorithms
• Improved robustness, load balancing, etc.
• Stochastic behavior of incoming traffic
• Without probability theory and statistics it would be hard to analyze them
16
Created by XMLmind XSL-FO Converter.
2.10. Notations
2.11. Definitions 2.12. Definitions - II
2.13. Expected values and moments 2.14. Variance and standard deviation 2.15. Joint probability
2.16. Conditional probability 2.17. Central limit theorem
2.18. Distributions for Internet measurements
2.19. Stochastic processes
• Typically, Internet measurements arrive over time, in some order
• To use the tools of probability in this settings we need to define the sequence of random variables which is called a stochastic process.
18
Created by XMLmind XSL-FO Converter.
2.20. Stochastic processes 2.21. Stochastic processes
2.22. Characterization of a stochastic process
2.23. Simpler stationary conditions
2.24. Measures of dependence
2.25. Measures of dependence
2.26. Measures of dependence
2.27. Modeling network traffic and user activity 2.28. Modeling network traffic and user activity 2.29. Short and long tailed distributions
2.30. Short and long tailed distributions 2.31. Short and long tailed distributions
2.32. Heavy tailed/power-law distribution
2.33. Heavy tailed distribution
20
Created by XMLmind XSL-FO Converter.
• New York City area road map
• Link lengths in km
2.34. Measured data
• Describing data
• For example: "mean of a dataset"
• An objectively measurable quantity which is the average of a set of known values
• Describing probability models
• For example: "mean of a random variable"
• A property of an abstract mathematical construct
• To emphasize the distinction, we add the adjective "empirical" to describe data
• Empirical mean vs. mean
• Classification of measured data
• Numerical: i.e. numbers
• Categorical: i.e. symbols, names, tokens, etc.
2.35. Describing data
2.36. More detailed descriptions
• Quantiles
• The pth quantile is the value
• below which the fraction p of
• the values lies.
• Median is the 0.5-quantile
• Percentile
• This can be expressed as percentile as well.
• E.g. the 90th percentile is the value that is larger than 90 percent of the data
22
Created by XMLmind XSL-FO Converter.
2.37. Histogram
• Defined in terms of bins which are a particular of the observed values
• Counts how many values fall in each bin
• A natural empirical analog of a random variable’s probability density function (PDF) or distribution function
• Practical problem:
• How to determine the bin boundaries
2.38. Empirical cumulative distribution function (CDF)
• Involves no binning or averaging of data values
• Provides more information about the dataset than the histogram.
• For each unique value in the data set, the fraction of data items that are smaller than that value (quantile).
• Empirical CCDF can be used similarly
2.39. Categorical data description
• Probability distribution
• An analog of the histogram for categorical data
• Measure the empirical probability of each symbol in the dataset
• Use histogram in decreasing order
24
Created by XMLmind XSL-FO Converter.
2.40. Describing memory and stability
Time series data
• Question: Do successive measurements tend to have any relation to each other?
Memory
• When the value of a measurement tends to give some information about the likely values of future measurements
• Empirical autocorrelation function (ACF):
Stability
• If its empirical statistics do not seem to be changing over time.
• Subjective
• Objective measures
• A typical approach is to break the dataset into windows
• E.g. a set of 1000 observations can be divided into 10 windows consisting of the 1st 100 observations, the 2nd 100 observations, and so on.
• Empirical statistics are calculated for each window then Looking for consistency, trends or predictable variation, etc.
2.41. High variability in Internet data
• Traditional statistical methods focuses on low or moderate variability of the data, e.g. Normal distribution
• However, Internet data shows high variability
• It consists of many small values mixed with a small number of large value
• A significant fraction of the data may fall many standard deviations from the mean
• Empirical distribution is highly skewed, and empirical mean and variance are strongly affected by the rare, large observations
• It may be modeled with a subexponential or heavy tailed distribution
• Mean and variance are not good metrics for high variability data, while quantiles and the empirical distribution are better,
• e.g. empirical CCDF on log-log axes for long-tailed distribution
2.42. Zipf’s law
• Categorical distributions can also be highly skewed
• A model for the shape of a categorical distribution when data values are ordered by decreasing empirical probability,
• e.g. URLs of Web pages
• Zipf’s law refers to
• the situation where
• For some positive
• constants c and B
2.43. GRAPH THEORY
2.44. Graph theory
• Generally, networks can be handled as directed or undirected graphs
• However, different phenomena could also be analysed by graph theory
• E.g. retweet graph in social network analysis
26
Created by XMLmind XSL-FO Converter.
• Graph theory could help us to characterize networks and other phenomena and analyze their properties
2.45. Graphs
• A graph is a pair
• Undirected and directed
• Unweighted and weighted
• 5
• 2
• 5
• 3
• 7
• 6
• 1
• 3
2.46. Subgraphs
2.47. Connected graphs
2.48. Metrics for characterization
2.49. Metrics for characterization
2.50. Matrix representation
2.51. Applications of Routing Matrix
• Origin-destination flow
2.52. Applications of routing matrix
• Delay of paths
• The routing and end-to-end delays are known
28
Created by XMLmind XSL-FO Converter.
2.53. Artificial graph constructions
• To model real networks by generating random graphs
• Erdős-Rényi model
• Random graphs
• Theoretical relevance
• Binomial degree distribution
• Barabási-Albert model
• Random scale-free networks
• Modeling natural and human-made systems
• Power-law degree distribution
• Other models like Watts and Strogatz model
2.54. Erdős-Rényi random graph
2.55. Erdős-Rényi random graph
2.56. Generalized random graph 2.57. Preferential attachment model 2.58. Preferential attachment model 2.59. Regular vs Random graphs
• Regular graph
• Long characteristic path length
• High degree of clustering
• Random Graph
• Short paths
• Low degree of clustering
• Small world graph
• Short characteristic path length
• High degree of clustering
2.60. AS level topology
30
Created by XMLmind XSL-FO Converter.
2.61. AS level topology
• High variability in degree distribution
• Some ASes are very highly connected
• Different ASes have dramatically different roles in the network
• Node degree seems to be highly correlated with AS size
• Generative models of AS graph
• "Rich get richer" model
• Newly added nodes connect to existing nodes in a way that tends to simultaneously minimize the physical length of the new connection, as well as the average number of hops to other nodes
• New ASes appear at an exponentially increasing rate, and each AS grows exponentially as well
2.62. AS level topology
2.63. MODELING
2.64. Measurement and modeling
• Model
• Simplified version of something else
• Classification
• A system model: simplified descriptions of computer systems
• Data models: simplified descriptions of measurements
• Data models
• Descriptive data models
• Constructive data models
2.65. Descriptive data model
• Compact summary of a set of measurements
• E.g. summarize the variation of traffic on a particular network as “a sinusoid with period 24 hours"
• An underlying idealized representation
• Contains parameters whose values are based on the measured data
• Drawback
• Can not use all available information
• Hard to answer "why is the data like this?" and "what will happen if the system changes?"
32
Created by XMLmind XSL-FO Converter.
2.66. Constructive data model
• Succinct description of a process that gives rise to an output of interest
• E.g. model network traffic as "the superposition of a set of flows arriving independently, each consisting of a random number of packets"
• The main purpose is to concisely characterize a dataset, instead of representing or simulating the real system
• Drawback
• Model is hard to generalize - such models may have many parameters
• The nature of the output is not obvious without simulation or analysis
• It is impossible to match the data in every aspect
2.67. Data model
• "All models are wrong, but some models are useful"
• Model is approximate
• Model omits details of the data by their very nature
• Modeling introduces the tension between the simplicity and utility of a model
• Under which model is the observed data more likely?
• Models involves a random process or component
• Three key steps in building a data model:
• Model selection
• Parsimonious: prefer models with fewer parameters over those with a larger number of parameters
• Parameters estimation
• Validating the model
2.68. Why build models
• Provides a compact summary of a set of measurements
• Exposes properties of measurements that are important for particular engineering problems, when parameters are interpretable
• Be a starting point to generate random but "realistic" data as input in simulation
2.69. Probability models
• Why use random models in the Internet?
• Fundamentally, the processes involved are random
• The value is an immense number of particular system properties that are far too tedious to specify
• Random models and real systems are very different things
• It is important to distinguish between the properties of a probabilistic model and the properties of real data.
3. 3 Network measurement infrastructures ETOMIC and SONoMA
3.1. Why Internet experimental facilities are needed?
• Internet became a large scale and complex network
• inefficient protocols
• in the Internet it is often not possible to measure traffic flows and other aspects of usage
• injecting active probes to discover these hiding properties
• Understanding the details of network and traffic dynamics
• Topology changing
• Queueing delay variations
• Available bandwidth
• One-way delay variations
• etc.
• Models and analysis of measurement data and traffic dynamics could lead to a better design of Future Internet protocols
3.2. Existing TestBeds and Network Measurement Infrastructures
34
Created by XMLmind XSL-FO Converter.
• DIMES
3.3. Lifecycle of network measurements 3.4. ETOMIC
3.5. The ETOMIC system
• The European Traffic Observatory Measurement InfrastruCture (etomic) was created in 2004 within the EU FP6 Evergrow Integrated Project
• ETOMIC also takes part in EIT ICTLabs FITTING and EU FP7 OpenLab
• Its goals:
• open access, public testbed for researcher
• high precision timestamping
• GPS synchronized
• Visit: www.etomic.org
• Best Testbed Award
3.6. System architecture
• Measurement nodes/agents
• Geographically dispersed machines are ready to be used by the users
• Advanced probing nodes called ETOMs
• Lightweight APE boxes
• Central Management System
• Schedule experiments, authenticate users, etc.
• Data repository
• Network Measurement Virtual Observatory
36
Created by XMLmind XSL-FO Converter.
3.7. Evolution of measurement nodes
3.8. ETOMs
• ETOMs with DAG
• Intel S875WP1E server
• Debian Linux
• Endace DAG 3.6GE
• 60 ns precision
• GPS synchronized
• Special C API for programing the DAG card
• User space applications
• Packet sender, capturer
• ETOMs with ARGOS
• HP ProLiant ML370 server
• Ubuntu Linux
• Quad core processor
• ARGOS card (dev. at UAM)
• 10 ns precision
• based on netFPGA
• GPS synchronized
• No need special API
• Standard pcap library can be used
3.9. APE boxes
• Active Probing Equipment
• low cost network measurment device
• ca. 300 Euro
• based on a Blackfin programmable board
• developed at Eötvös Loránd University
• 100 ns precision
• GPS synchronized
• uClinux - Linux operating system for embedded systems
• Low energy consumption
• web service interface for performing predefined measurements (ping, traceroute, packet train sender, capturer, etc.)
3.10. Measurement boxes
3.11. Central Management System
• IBM Blade server
• Key tasks
38
Created by XMLmind XSL-FO Converter.
• User management
• Node maintenance
• Experiment scheduling
• Storing experimental results (temporally)
• Web GUI
3.12. Slices VS Unique timeslots
• In PlanetLab
• Virtualization
• Slices
• Sharing the resources
• Introducing too much unpredictability in timing measurements
• Low precision timestamping
• In etomic
• No virtualization
• No slices
• Unique timeslots
• You own all the resources you need during the experimentation
• High precision timestamping
3.13. The ETOMIC system
40
Created by XMLmind XSL-FO Converter.
• ETOMIC
• www.etomic.org
• ANME
42
Created by XMLmind XSL-FO Converter.
• www.onelab.eu
• www.etomic.org
• Onelab-2:
• 26 partners
• in 13 countries
• ETOMIC:
• ca. 40 ETOMs and
• 20 APE boxes
• on more than 20 different sites
3.14. One day on the Internet
3.15. Experimental use cases in ETOMIC
• one way delay (60nanosec resolution)
• tracking topology changes
• available bandwidth meter
• transport protocol testing
• queuing delay tomography
• geolocation experiments
• …
44
Created by XMLmind XSL-FO Converter.
3.16. HOW TO USE ETOMIC?
46
Created by XMLmind XSL-FO Converter.
3.17. Performing an experiment from the system’s perspective
• Setting up an experiment
• Uploading scripts, data files
• Selecting measurement agents
• Reserving one or more timeslots
• Initializing phase
• Reserving the selected
• measurement agents
• Uploading measurement scripts
• and other files needed for the experiment
• Execution phase
• Running the uploaded scripts with the preconfigured settings on the etomic nodes
• Data collection phase
• Downloading and storing the resulting data files in the CMS database
3.18. Measurement types
• User specific measurements
• Customized experiments defined by the end-users
• Almost full control over the measurement agents
• User specific periodic measurements
• Customized periodic experiments defined by the users
• Repeating an existing experiment more times
• inter-experiment times
• repetition count
• Kernel level periodic measurements
• Carrying out experiments by the CMS itself
• Low priority task
• executed only if the nodes are idle
• if a user level experiment comes then it is canceled
• Invisible for end-users
48
Created by XMLmind XSL-FO Converter.
3.19. Necessary steps for submitting an experiment
• Create a bundle
• Choose the agents
• Configure them to run your experiment
• Create a new experiment
• Choose a bundle to be executed
• Schedule your experiment by selecting a start instant for it
• Running the experiment
• The current state of your experiment can be seen
• Downloading and analyzing the results
• When the status of your experiment is finished the resulting files can be downloaded
3.20. Creating a bundle
• Online demo at www.etomic.org
3.21. Creating an experiment and querying its status
• Online demo at www.etomic.org
3.22. Downloading the results
• When the status of the experiment is finished, click on the results icon to go to the Result files tab in the Edit/View files section where you can download all the files.
3.23. Programming DAG cards
• The libeverdag C library is provided in the Evergrow project to ease the use of the DAG 3.6GE cards.
• The unique features provided by these cards:
• Synchronized send with high precision timestamps inserted in the payload.
• High performance capture with precision timestamps of receive time
• More details in the API documentation
50
Created by XMLmind XSL-FO Converter.
3.24. PUBLISHING DATA 3.25. Experimental facilities
• most network measurement projects:
• use a single dedicated infrastructure
• scan only narrow subsegments
• analyze a limited set of network characteristics
• centralized and separated from each other
• key idea: try to interconnect separate measurement data!
• large-scale behaviour
• long-term evolution
3.26. Traditional approach
• Traditionally measurements are designed to collect only
• specific data, important from the point of view of the
• researcher’s agenda.
3.27. Sharing science
• Genome databases
52
Created by XMLmind XSL-FO Converter.
• Astronomy
3.28. Related work: CAIDA/DatCat
3.29. Related work: MoMe database
3.30. Related work: MAWI repository
3.31. Data publication efforts
• DatCat (USA):
• searchable catalog of metadata about measurements
• passive traffic traces, traceroutes, BGP tables, virus propagation studies
• MOME (EU):
• database for meta-measurement data
• packet and flow traces, routing data, HTTP traces
• standardization efforts
• sharing of analysis tools is possible (e.g. jitter calculation)
• MAWI (Japan):
• repository of passive traces from the WIDE backbone (collected since 1999)
• raw data is not stored
54
Created by XMLmind XSL-FO Converter.
• raw data
• from single
• infrastructure
3.32. Key ideas in data handling
• store and share raw data
• joint analysis of different types of measurement data
• reanalysis (with new evaluation methods)
• reference data (historical comparison)
• share analysis tools
• server side processing simplifies client applications
• no need to transfer bulk data packages: online processing
• standardization, network XML
• Network Measurement Virtual Observatory (nmVO)
3.33. VO approach
• The modern approach is to collect and store all measurable data and make it available for "virtual observation". Virtual measurements can have set of goals different from the original
3.34. Unified interface
• VO can be realized by collecting measurement data from different infrastructures. Data structures should be standardized netXML
3.35. Casjobs User Interface for accessing data
• The Network Measurement Virtual Observatory is available at
• http://nm.vo.elte.hu
3.36. SONOMA
3.37. SONoMA v1.0
• Service Oriented NetwOrk Measurement Architecture
• It was originally developed to instrument APE measurement boxes
• Provide a web service interface for the users to perform predefined network measurements
• It can easily be extended with new measurement agents
• Standardized communication via web services
• Visit: sonoma.etomic.org
3.38. Why do we need another network measurement platform?
• The state of the art measurement systems:
56
Created by XMLmind XSL-FO Converter.
• ETOMIC: knowhow to program card
• PlanetLab: scheduling, data collection and storage is up to the user
• Dimes: few tools; Penny; cannot predict startup
• Scriptroute: lacks cooperation and synchronization between nodes; no repository
• perfSONAR: focuses on monitoring in contrast to complex measurements
• Looking glass like: few metrics, no common iface
3.39. SONoMA
• "Its key objective is to integrate the complexity of large testbeds
• and the popularity of the lightweight services."
• Federates heterogeneous measurement agents
• (19 APE, 230 PlanetLab); dynamic accounting
• Distributes tasks in a robust and efficient way
• Provides an easy-to-extend framework
• A collection of off the shelf measurement and evaluation methods
• Decreases client side development effort (SOA)
• A tool for novel Internet applications
• that require real-time large-scale measurement data
• Archives data in an automated fashion
• In a transparent way
• Supporting both short and long term experiments
3.40. System components
• Actors
• A user or a user application
• Management Layer
• Account users (allow PLE) and Measurement Agents
• Authenticate users, authorize calls
• Handle sessions, decompose and schedule processes
• Hybrid resource management:
• time sharing / reserving
• Measurement Agents
• currently two kinds of agents are deployed:
• APE boxes, PlanetLab nodes
3.41. Management Layer
[fragile]
3.42. Measurement methods
MA methods defined:
• ping, traceroute, chirp, train, capture, dnslookup
• Synchronous / Asynchronous
• Cannot be accessed by the actors directly ML method composition:
• Sychronous/Asynchronous measurements = short/long ones
• Atomic measurements:
short*, long*, parallel*, ensemble*, ensembleNSlookup
* = Ping or Traceroute
• Complex measurements /require synchronization/:
short*, long*
* = Chirp, Train
• Statistical evaluations /measurement(s) + model/: getAvailableBandwidth, topology, tomography
3.43. Web client
• http://sonoma.etomic.org
58
Created by XMLmind XSL-FO Converter.
[fragile]
3.44. Case study: A full mesh topology measurement
A python-like example for obtaining the full mesh topology spanned by the MAs:
# Service object:
ws = ServiceServerSOAP( url="https://sonoma.etomic.org:8888" )
# Open a session and authenticate
sessionID = ws.requestSession( user="User", zip=True, format="CSV" )
# Submit a topology measurement with the list of Mas to be involved in the experiment
# The returned procID is a unique id which refers to the given experiment
procID = ws.topology( sessionID, nodeList = [ "157.181.175.247", "132.65.240.38", ... ] )
# Waiting for that the measurement ends