** ? 6.1 The curse of dimensionality**

**Exercise 7.1. Suppose you have a collection of recipes including a list of ingredients required for them. In case you would like to find recipes that are**

**8.1 Graphs as complex networks**

**Complex networks**are mathematical objects that are meant for
de-scribing real world phenomena and processes. These mathematical
objects are often referred as graphs as well. A graph is a simple and
highly relevant concept in graph theory and computer science in
gen-eral. In its simplest form, a graph is a collection of nodes or vertices
and a binary relation which holds for a subset of the pairs of vertices,
which is indicated by edges connecting pairs of vertices within the
graph for which the relation holds. As such graphs can be given as
G = (V,E), withVdenoting its vertices andE ⊆ ^{V}×^{V, with}×
referring to the Cartesian product of the vertices.

**Learning Objectives:**

• Modeling by Markov Chains

• PageRank algorithm and its variants

• Hubs and Authorities

data m i n i n g f ro m n e t w o r k s 169

*8.1.1* Different types of graphs

Complex networks can come in many forms, i.e., the edges can be
both directed or undirected, weighted or unweighted, labeled or
un-labeled. What directedness means in case of networks is that
when-ever an edge exist in one direction between two vertices, it is also
warranted that the edge in the reverse direction can be found in the
network, that is(u,v) ∈ ^{E} ⇔ (v,u) ∈ E. This happens when the
underlying binary relation defined over the vertices is symmetric.

For instance networks that represent which person – represented by a vertex – knows which other people in an institution would be best represented by an undirected graph. Ontological relations, such as being the subordinate of something, e.g. humans are vertebrates (but not vice versa), are on the other hand do not behave in a symmet-ric manner, hence the graph representing such knowledge would be undirected. Another example for undirected networks is the hy-perlink structure of the world wide web, i.e., the fact that a certain website points do another one, does not imply that there also exists a hyperlink in the reverse direction.

When edges are weighted, it means that edges are not simply
given in the form of(u,v) ⊆^{V}×^{V, but as}(u,v,w)which is a tuple
representing not just a source and target node (uandv) but also
some weight (w ∈ **R) that describes the strength of the connection**
between pairs of nodes.

**Semantic networks, such asWordNet**^{3}and**ConceptNet**^{4}, are ^{3}Miller1995

4Speer et al.2016

prototypical examples for labeled networks. Semantic networks have
commonsense concepts as their vertices and there could kinds of
re-lations hold between the vertices which are indicated by the labels of
the edges. Taking the previous example, there is a directed edge
la-beled with the so-called Is-A relation between the vertex representing
the concepts ofhumansandvertebrates. That is(u,v, Is-A)∈ ^{E, }
mean-ing that the Is-A relation holds for the concept pair(u,v)for the case
whenuandvare the vertices for concept ofhumansandvertebrates,
respectively.

As mentioned earlier, complex networks can be used to represent various processes and phenomena of every day life. Complex net-works can be useful to model collaboration between entities, citation structure of scientific publications and various other kinds of social

and economic interactions. Try to list additional uses cases

when modeling a problem with networks can be applied.

### ?

*8.1.2* Representing networks in memory

Networks of potential interest can range up to the point when they contain billions of vertices. Just think of the social network of Face-book for a very trivial example which had nearly2.5billion of active

users over the first quarter of2019^{5}. ^{5}https://www.socialmediatoday.com/

news/facebook-reaches-238-billion- users-beats-revenue-estimates-in-latest-upda/553403/

We should note that real-world networks are typically extremely
sparse, i.e.,|^{E}| |^{V}×^{V}|, meaning that the vast majority of the
potentially observed relations are not realized. In terms of a social
network, even if the entire network has billions of nodes, the number
of average connections per vertices is orders of magnitude smaller,
say a few hundreds.

To this end, the networks are stored in a more efficient format,
one of which is the**adjacency list**representation that is depicted in
Figure8.1(c). The great benefit of the adjacency list representation
is that it takesO(|^{E}|)amount of memory for storing the graph, as
opposed toO(|^{V}^{2}|)which applies to the explicit**adjacency matrix**
representation schematically displayed in Figure8.1(b)for the
exam-ple directed graph from Figure8.1(a).

Similar to other situations, we pay some price for the efficiency
of adjacency lists from the memory consumption point of view, as
checking for the existence of an edge increases fromO(1)toO(|^{V}|)
in the worst case scenario when applying an adjacency list instead
of an adjacency matrix. This trade-off, however, is a worthy one in
most real world situations when dealing with networks with a huge
number of vertices and a relatively sparse link structure. Figure8.2
illustrates how to store the example network from Figure8.1(a)in
Octave when relying on both explicit dense and sparse
representa-tions.

(b) Adjacency matrix of the graph.

1 → ^{2} ^{3} ^{4}

2 → ^{1} ^{4}

3 → ^{4}

4 → ^{2} ^{3}

(c) Adjacency list of the graph.

Figure8.1: A sample directed graph with four vertices (a) and its potential representations as an adjacency matrix (b) and an edge list (c).

adjacency = [0 1 1 1; 1 0 0 1; 1 0 0 0; 0 1 1 0];

from_nodes = [1 1 1 2 2 3 4 4];

to_nodes = [2 3 4 1 4 1 2 3];

edge_weights = ones(size(to_nodes));

sparse_adjacency = sparse(from_nodes, to_nodes, edge_weights);

**C****ODE SNIPPET**

Figure8.2: Creating a dense and a sparse representation for the example digraph from Figure8.1.

data m i n i n g f ro m n e t w o r k s 171