Technical details - In partial fulfillment of the requirements for the title of Doctor of the H

The real network data. The Internet dataset representing the global Internet structure at the Autonomous System (AS) level is from [27]. The metabolic network is the post-processed network of metabolic reactions in E. coli from [141], Snapshot S₁there. The post-processing details can be found in [141]. The word network is the largest connected component of the network of adjacent words in Charles Darwin’s

“The Origin of Species” from [120]. The airport network was downloaded from the Bureau of Transportation Statisticshttp://transtats.bts.gov/on November 5, 2011. The structural human brain network and physical coordinates of nodes (regions of interest (ROIs)) in it are the diffusion spectrum imaging (DSI) data from [80].

The hyperbolic maps of real networks. The hyperbolic coordinates of ASes and metabolites are from [27] and [141]. The hyperbolic coordinates of words and airports are inferred using the HyperMap algorithm [140]. This algorithm is deter-ministic and is based on the growing network model in [141] used to show that the latent geometry of scale-free strongly clustered real networks is hyperbolic. Given an adjacency matrix of a real network, the algorithm infers the hyperbolic coordinates of its nodes by replaying its growth as the model in [141] prescribes. Accurately, the nodes are first sorted in the order of decreasing degrees, and then, starting with the highest-degree node, nodes and their edges are added, one node at a time, to a grow-ing network. The probability, or the likelihood, with which model [141] generates this growing network, depends on the node coordinates. The HyperMap algorithm sets the coordinate of each added node to the coordinate corresponding to the global maximum of this probability.

The Nash equilibrium networks of NNGs. The hyperbolic or physical, in the airport and brain cases, coordinates are then supplied to the GNU Linear Programming Kit (GLPK) http://www.gnu.org/software/glpk/ used to find a solution to the corresponding minimum set cover problem. To yield acceptable

running times of the solver, the Internet and word networks are reduced in size by extracting their high-degree cores of about 4500 nodes. The Hungarian road data is processed slightly differently. First, the cities in Hungary are mapped to their geographic coordinates using the database in http://www.kemitenpet.

hu/letoltes/tables.helyseg_hu.xls. Then these coordinates are used in the GLPK to find the NNG equilibrium. Each edge in this equilibrium network is then checked for existence in the real road network. To check that, the GoogleMaps API https://pypi.python.org/pypi/googlemaps/ is used to find the shortest path between the two cities connected by the edge. The edge is defined to also exist in the real road network if this shortest path does not go via any other city.

dc_1742_20

Chapter 6 Hierarchical systems

We have seen so far that greedy navigation, supported by the hidden metric space of the network, can account for the excellent navigability of networks. Although the framework of greedy navigation is very compelling, the embedding of real networked systems into metric spaces ensuring reliable navigation can be very cumbersome and non-intuitive in many cases (see [25]). In such cases, the function→structure approach clearly inherits the non-trivialities of greedy navigation and metric spaces.

In this chapter, we show that the characterization of navigation paths used in networks can be achieved to a sufficient extent. This enables the function→structure analysis without assuming the mechanism of greedy navigation. Our approach here focuses on the high-level structure of the paths used in the network. There are nu-merous examples that real networks exhibit a hierarchical structure. Organizational (e.g., military) networks, for example, are well-known to have a clearly defined hier-archy. The Internet is another example in which the connections between internet domains are hierarchical, pointing from customer to provider. We show that these underlying hierarchies have a significant impact on the operative paths in the net-work. At this point, we turn back to our networks and paths investigated in Chap-ter 3, the InChap-ternet AS topology, the air transportation network, the word morph network, and the human brain. Recall that for these networks, we have collected large datasets about both the structure of the networks and the empirical paths. A deeper analysis of these empirical paths uncovers two additional features (on top of stretch introduced in Chapter 3) in connection with the underlying hierarchy.

One such feature our measurements support is “conform hierarchy.” (CH), mean-ing that the used paths follow the topological hierarchy of the network. For showmean-ing this, we have computed the closeness centrality of the nodes comprising the em-pirical paths indicating which (inner or outer) parts of the network the information flows through. The closeness centrality of the node is computed as: C(x) = ^P ^N

yd(y,x), whered(y, x)is the distance between verticesxandy, while N refers to the number of nodes in the network. We found that most of the empirical paths do not contain a large-small-large pattern forming a “ valley” anywhere in their closeness centrality sequence. This informally means that higher-level nodes do not prefer the exchange of information through their subordinates, even if there are short paths through them. On a CH path, the closeness centrality increases monotonically at first up to a point (upstream), then starts to decrease (downstream) until it reaches the destination, or it is just going upstream or downstream all the way. Fig. 6.1

illus-E N W T RO

ER IPH

ER Y

Increasing

centralit y

Network core X

up stre

Non-C H path CH

dow nstre

ampa th

Figure 6.1: Illustration of paths with regard to the internal logic of the network.

A path is CH if it does not contain a large-small-large pattern forming a “valley”

anywhere in its centrality sequence (green and orange paths). Red paths show examples of non-CH paths. An upstream path contains at least one step upwards in the hierarchy of the network (orange paths), while in downstream paths, the centrality decreases all the way (green paths).

trates this graphically. One could argue that maybe short paths on real networks have this property as a default, but Fig. 6.2a-d verify that this is not the case. For comparison, we picked random paths between the source-destination pairs of our empirical paths with the same stretch distribution and plotted the results for that case too. One can see that, while the path length distribution is the same for the two datasets, a much larger fraction of stretch-equivalent random paths violate the CH feature.

There can be subtle differences between CH paths of similar length. For example, a path can contain upstream than downstream steps or downstream steps only.

Recall that an upstream step goes towards the core, while a downstream step goes towards the periphery of the network. Is there a preference among these? For answering this, we plotted the Cumulative Distribution Function (CDF) of CH paths with respect to the number of upstream steps preceding the downstream phase (Figure 6.3a-d). For comparison, we have also plotted the results of a random policy that picks randomly from the possible CH paths of the given length. The plots confirm that the empirical paths contain much less upstream steps, which means that these paths try to avoid stepping towards the core. This finding adds

“prefer downstream” as a third identifiable path selection feature (see Fig. 6.1 for an illustration). We note that such behavior is easy to interpret on the Internet, since dc_1742_20

stepping towards the core of the network implies paying for a transit provider for carrying the traffic while going downstream comes for (almost) free. However, at this time, it is not clear what causes the same behavior in the other networks.

Stretch

Figure 6.2: Identified path selection features confirmed by our measurement data.

Panels a-d show the hierarchical conformity of the empirically-determined paths against stretch. The inset of the plots shows the relative difference between the number of CH paths in the empirical and random paths. In the case of small networks, there are 15-85% more CH paths in the empirical traces, but in the case of the large AS level Internet, this goes up to 100-500%. The cyan colored data in the plots show the number of CH paths in a randomized version of our networks generated with the degree sequence (DS) algorithm, which produces exactly the same degrees for the nodes, but the edges are completely randomized. The plots confirm that the topological peculiarities of real networks increase the number of CH paths between endpoints with respect to the DS networks (see the explanation brackets between the cyan and magenta-colored dots of panel a and b). However, we argue that the effect of the CH feature is at least that important or even more fundamental (e.g., in case of the Internet).

# of upward steps

Figure 6.3: Panels a-d show the cumulative distribution of upstream steps in the traces of our datasets. The empirical paths tend to avoid stepping towards the core, which is reflected by the much lower number of upstream steps (in comparison with the randomly selected CH paths of the same length) before entering the downstream phase.

In document In partial fulfillment of the requirements for the title of Doctor of the Hungarian Academy of Sciences (Pldal 65-70)