Graphs as complex networks

In document DATAMINING GÁBORBEREND (Pldal 168-171)

? 6.1 The curse of dimensionality

8.1 Graphs as complex networks

Complex networksare mathematical objects that are meant for de-scribing real world phenomena and processes. These mathematical objects are often referred as graphs as well. A graph is a simple and highly relevant concept in graph theory and computer science in gen-eral. In its simplest form, a graph is a collection of nodes or vertices and a binary relation which holds for a subset of the pairs of vertices, which is indicated by edges connecting pairs of vertices within the graph for which the relation holds. As such graphs can be given as G = (V,E), withVdenoting its vertices andE ⊆ V×V, with× referring to the Cartesian product of the vertices.

Learning Objectives:

• Modeling by Markov Chains

• PageRank algorithm and its variants

• Hubs and Authorities

data m i n i n g f ro m n e t w o r k s 169

8.1.1 Different types of graphs

Complex networks can come in many forms, i.e., the edges can be both directed or undirected, weighted or unweighted, labeled or un-labeled. What directedness means in case of networks is that when-ever an edge exist in one direction between two vertices, it is also warranted that the edge in the reverse direction can be found in the network, that is(u,v) ∈ E ⇔ (v,u) ∈ E. This happens when the underlying binary relation defined over the vertices is symmetric.

For instance networks that represent which person – represented by a vertex – knows which other people in an institution would be best represented by an undirected graph. Ontological relations, such as being the subordinate of something, e.g. humans are vertebrates (but not vice versa), are on the other hand do not behave in a symmet-ric manner, hence the graph representing such knowledge would be undirected. Another example for undirected networks is the hy-perlink structure of the world wide web, i.e., the fact that a certain website points do another one, does not imply that there also exists a hyperlink in the reverse direction.

When edges are weighted, it means that edges are not simply given in the form of(u,v) ⊆V×V, but as(u,v,w)which is a tuple representing not just a source and target node (uandv) but also some weight (w ∈ R) that describes the strength of the connection between pairs of nodes.

Semantic networks, such asWordNet3andConceptNet4, are 3Miller1995

4Speer et al.2016

prototypical examples for labeled networks. Semantic networks have commonsense concepts as their vertices and there could kinds of re-lations hold between the vertices which are indicated by the labels of the edges. Taking the previous example, there is a directed edge la-beled with the so-called Is-A relation between the vertex representing the concepts ofhumansandvertebrates. That is(u,v, Is-A)∈ E, mean-ing that the Is-A relation holds for the concept pair(u,v)for the case whenuandvare the vertices for concept ofhumansandvertebrates, respectively.

As mentioned earlier, complex networks can be used to represent various processes and phenomena of every day life. Complex net-works can be useful to model collaboration between entities, citation structure of scientific publications and various other kinds of social

and economic interactions. Try to list additional uses cases

when modeling a problem with networks can be applied.

?

8.1.2 Representing networks in memory

Networks of potential interest can range up to the point when they contain billions of vertices. Just think of the social network of Face-book for a very trivial example which had nearly2.5billion of active

users over the first quarter of20195. 5https://www.socialmediatoday.com/

We should note that real-world networks are typically extremely sparse, i.e.,|E| |V×V|, meaning that the vast majority of the potentially observed relations are not realized. In terms of a social network, even if the entire network has billions of nodes, the number of average connections per vertices is orders of magnitude smaller, say a few hundreds.

To this end, the networks are stored in a more efficient format, one of which is theadjacency listrepresentation that is depicted in Figure8.1(c). The great benefit of the adjacency list representation is that it takesO(|E|)amount of memory for storing the graph, as opposed toO(|V2|)which applies to the explicitadjacency matrix representation schematically displayed in Figure8.1(b)for the exam-ple directed graph from Figure8.1(a).

Similar to other situations, we pay some price for the efficiency of adjacency lists from the memory consumption point of view, as checking for the existence of an edge increases fromO(1)toO(|V|) in the worst case scenario when applying an adjacency list instead of an adjacency matrix. This trade-off, however, is a worthy one in most real world situations when dealing with networks with a huge number of vertices and a relatively sparse link structure. Figure8.2 illustrates how to store the example network from Figure8.1(a)in Octave when relying on both explicit dense and sparse representa-tions.

(b) Adjacency matrix of the graph.

1 → 2 3 4

2 → 1 4

3 → 4

4 → 2 3

(c) Adjacency list of the graph.

Figure8.1: A sample directed graph with four vertices (a) and its potential representations as an adjacency matrix (b) and an edge list (c).

adjacency = [0 1 1 1; 1 0 0 1; 1 0 0 0; 0 1 1 0];

from_nodes = [1 1 1 2 2 3 4 4];

to_nodes = [2 3 4 1 4 1 2 3];

edge_weights = ones(size(to_nodes));