Get Complete Project Material File(s) Now! »
BASIC CONCEPT AND PRELIMINARIES
n this Chapter we provide the basic concepts and background theory that will Ibe used throughout the thesis. Initially a basic introduction to graph theory is given along with the definitions of the node centralities that will be mentioned throughout the Chapters. The k-core decomposition is given special attention as it is a basic concept of the thesis, specifically for Chapters 3 and 5. Finally, the graph datasets that have been used for the experiments in the Chapters to follow are described. In each Chapter, the necessary background is presented in the respective section along with the
symbols that will be used.
introduction to graph theory
A network is represented as a graph (the terms graph and network are used interchange-ably throughout the dissertation). A graph is a pictorial representation of a set of objects some of the pairs of which are connected with links. Formally a graph G is a pair of sets (V,E) where V is the set of vertices and E is the set of edges connecting the pairs of vertices. The number of nodes in the graph is equal to n = |V| and the number of edges m = |E|.
There are different types of graphs depending upon the number of vertices and edges, the interconnectivity and their overall structure. Some of those are defined below. Fig-ures 2.1 to 2.3 depict some examples of different types of graphs.
Definition 2.1. Null Graph
A null graph G = (V; E) contains n isolated nodes and no edges among them. They are also called
endless graphs.
Definition 2.2. Directed and Undirected Graph
• In a directed graph GD = (V; E), every edge (u; v) 2 E links node u to node v (ordered pair of nodes).
• An undirected graph G = (V; E) is a directed one where if edge (u; v) 2 E, then edge (v; u) 2 E as well.
Definition 2.3. Weighted Graph
Every edge (u; v) 2 E in a weighted graph G = (V; E) is associated with a real number wuv called its weight.
path A path is defined as a sequence of nodes v1; v2; :::; vN-1; vN. Every consecutive pair of nodes in the path vk; vk+1 is connected with an edge. For a directed graph the notion of the path is extended as follows: in a directed path a directed edge should exist from each node of the sequence to the next node. Two nodes u; v 2 V are called connected if there is a path in the graph from node u to node v.
Definition 2.4. Connected and Disconnected Graph
• In a connected graph G = (V; E) there exists a path between every pair of vertices. There should be at least one edge for every vertex u in the graph so that it is connected to some other vertex v at the other side of the edge.
Figure 2.2: Examples of (a) a connected and (b) a disconnected graph. In the connected graph we observe that each vertex has its edge connected to another edge whereas in the disconnected graph there exist two components which are not connected to each other.
• A graph G = (V; E) is disconnected if there exists at least a node u which is not connected to another node v.
Definition 2.5. Cyclic and Acyclic Graph
• A cyclic graph G = (V; E) is a graph that contains at least one cycle: the starting vertex of its first edge equals the ending vertex of its last edge.
• An acyclic graph G = (V; E) is a graph that contains no cycles.
degree In an undirected graph G = (V; E), a node v 2 V has a degree dG(v) = d if it has d incident edges. For the case of a directed graph, every node is characterized by two types of degrees: its in-degree and its out-degree. The in-degree of node v equals to the number of incoming edges dinG(v) = juj(u; v) 2 Ej whereas its out-degree equals to the number of outcoming edges doutG(v) = juj(v; u) 2 Ej. In undirected graphs dinG(v) =
doutG(v).
Definition 2.6. Simple, Regular and Complete Graph
• A graph G = (V; E) is a simple graph when it does not contain any loops (i.e., there exist no edges connecting a vertex to itself) or any parallel edges (i.e., edges that are incident to the same two vertices). The maximum number of edges possible in a single graph with n vertices is equal to nC2 = n(n – 1)=2. The number of simple graphs possible with n nodes is 2nC2 = 2n(n-1)=2.
• The vertices of a regular graph G = (V; E) have the same degree. If 8v 2 V : dG(v) = k then the graph is called a k-regular graph.
• Every pair of nodes in a complete graph G complete graph with n vertices has m = n2
= (V; E) is connected by a unique edge. A
= n(n – 1)=2 edges.
For a complete introduction to the field of complex networks, the reader may refer to Refs. [19, 24, 34, 126].
adjacency matrix and eigenvalues
Every type of graph can be represented as a matrix. This matrix is called the adjacency matrix A of the graph. Matrix A is a square matrix of size n n where the rows and columns represent the nodes of the graph and the entries indicate whether there exists an edge between the respective pair of nodes.
Definition 2.7. Adjacency Matrix [18]
The adjacency matrix A of a graph G = (V; E) is a n n matrix such that: Auv = 8 auv; if (u; v) 2 E; 8 u; v 2 1; : : : ; n (2.1) : 0; otherwise.
For weighted graphs, each value auv represents the weight associated with each edge (u; v) while for unweighted graphs auv = 1; 8(u; v) 2 E. For a simple graph with no self-loops, the adjacency matrix must have zeros on the diagonal. For an undirected graph, the adjacency matrix is symmetric.
Let A be a symmetric matrix. A vector u is defined as an eigenvector of A if and only if Au= u, where is a scalar called eigenvalue corresponding to u. Then A can be written as A = U UT . The orthogonal matrix U contains as columns the eigenvectors u1, u2,
…, un of A that correspond to real eigenvalues 1 > 2 > … > n and = diag ( 1, 2.
…, n) the diagonal matrix with the eigenvalues as entries.
node centralities
The concept of centrality is being used in order to define a node’s importance according to its involvement in the network [20, 25, 26, 61]. In general centrality measures assign values to the nodes of a graph which can be used to rank the latter subject to their im-portance in the network. The different centralities that will be presented can be split in two subcategories:
i) Structural centralities which can be obtained exclusively on structural information.
ii) Iterative refinement centralities which use dynamical processes and iterative refine-ment methods to explore the nodes’ structural properties.
Structural centralities
Structural centralities can be further classified into neighborhood-based and path-based cen-tralities.
Neighborhood-based centralities
Degree centrality
As described earlier in the Chapter, the degree dG(v) of a node v of an undirected network G = (V; E) represents the number of incindent edges on this node. It can also be calculated as follows: dG(v) = Pu auv where A = fAuvg is the adjacency matrix of the graph and u 2 Neighbors(v).
Degree centrality is used for a wide range of applications due to its simplicity and low-computational complexity. In some cases, degree acts exceptionally good. When the rate of the spreading of information is very low, degree can identify better the influential capabilities of nodes than some other centralities which on average are better influen-tial indicators [94, 107]. Additionally, concerning network vulnerability, degree-targeted attack can destroy scale-free networks and exponential networks very effectively [86].
As mentioned before, nodes in directed networks are characterized by two types of degrees the in-degree and the out-degree based on whether the incoming or outcoming edges of the node are taken into account respectively. In weighted graphs degree is usually replaced by the strength which is defined as the sum of weights of the node’s associated edges.
k-core centrality
The k-core centrality or node’s core number or coreness is a number assigned to each node after the k-core decomposition of the network. The k-core decomposition is a hierarchical decomposition of the graph into nested subgraphs. Let G = (V; E) be an undirected graph with k 2 Z and k > 0.
Definition 2.8. k-core subgraph
Let H be a subgraph of G, (i.e., H G). Subgraph H is defined to be a k-core subgraph of G,
denoted by Ck, if it is a maximal connected subgraph in which all nodes have degree at least k.
Definition 2.9. Node’s Core Number
A node i has core number ci = k, if it belongs to a k-core but not to any (k + 1)-core.
It is evident that if all the nodes of the graph have degree at least one, i.e., d(v) > 1; 8v 2 V, then the 1-core subgraph corresponds to the whole graph, i.e., C1 G. Further-more, assuming that Ci; i = 0; 1; 2; : : : ; kmax is the i-core of G, then the k-core subgraphs are nested, i.e.:
Typically, subgraph Ckmax is called maximal k-core subgraph of G. Figure 2.4 depicts an example of a graph and the corresponding k-core decomposition.
Computing the k-core decomposition of a graph can be done through a simple process that is based on the following property: to extract the k-core subgraph, all nodes with degree less than k and their adjacent edges should be recursively deleted [145]. That way, beginning with k = 0, the algorithm removes all the nodes (and the incident edges) with degree equal or less than k, until no such nodes have been remained in the graph. Also notice that, removing edges that are incident to a node may cause reductions to the degree of neighboring nodes; the degree of some nodes may become at most k, and thus, they should also be removed at this step of the algorithm. When all remaining nodes have degree d(v) > k, k is increased by one and the process is repeated until no more remaining nodes have left in the graph. Since each node and edge is removed exactly once, the running time of the algorithm is O(n + m) [116]. Batagelj and Zaveršnik later proposed an O(m) algorithm for k-core decomposition [15] presented in 2.1.
Algorithm 2.1 k-core decomposition
1: Input: Undirected graph G = (V; E)
2: Output: Vector of core numbers ci, i = 1; 2; :::n = jVj
3: Compute the degrees of each node dG(v); 8v 2 V
4: Order the nodes in increasing order of their degrees dG(v)
5: for each v 2 V do
6: cv dG(v)
7: for each u 2 Neighbors(v) do
8: if dG(u) > dG(v) then
9: dG(u) dG(u) – 1
10: Reorder V accordingly
return Core numbers ci; 8i 2 V
This algorithm considers unweighted and undirected graphs. Nevertheless lot of ef-fort has been made from researchers towards extending the k-core decomposition to other types of graphs. Refs. [58, 65, 68] present extensions on weighted graphs while Giatsidis et. al [67] present an extension for directed graphs. A core decomposition on probablistic graphs is proposed by Bon-chi et. al [23]. In a probabilistic or uncertain graph every edge is associated with a probability of existence.
Path-based centralities
Closeness centrality
In a connected graph G = (V; E) the closeness centrality of a node is calculated as the sum of the length of the shortest paths between the node and all other nodes in the graph. For a node v 2 V it can be calculated as follows [62, 142]:
CCv = P n – 1 (2.3)
u6=v duv
where n = jVj and duv expresses the distance between vertices u and v. Nevertheless, when the graph is not connected there exist some node pairs for which duv = 1. In this case closeness centrality is defined in terms of the inverse of the harmonic mean distances between the nodes:
CCv = 1 1 (2.4)
n – 1 P u6=v duv
Betweeness centrality
The betweeness centrality [16, 146, 147] is a centrality measure based on the notion of shortest paths between nodes. In an unweighted and connected graph, there exists between every pair of nodes one shortest path such that the number of edges passing through is minimized. For weighted and connected graphs it is the sum of the weights that is minimized [61]. Then, the betweeness centrality of each node is the number of these shortest paths passing through this node. It is calculated as follows:
6 X gv
BCv = xy (2.5)
gxy
v=x;v6=y;x6=y
where gxy is the number of the shortest paths between nodes x and y and gvxy is the number of the paths which pass through v among all the shortest paths between x and y.
Iterative refinement centralities
Eigenvector centrality
The eigenvector centrality of a node v is proportional to the summation of the central-ities of the nodes to which it is connected [21]. It is a measure of the importance of a node, denoted by xv and is calculated as follows:
n
X=1 (2.6)
xv = c avuxu
where c = 1= . c is a proportionality constant and is the largest eigenvalue of the adjacency matrix A. The eigenvector centrality can be efficiently computed via the power iteration method [79] at the beginning of which each node is assigned with a score of 1. Every node shares its score in an even way to its neighbors and receives a new value during every iteration of the method. The process ends when the values of each node reach a steady state.
PageRank
PageRank (PR) is a famous variant of the eigenvector centrality which supposes that the importance of an entity in a network is determined by both the quantity and the quality of its connected neighbors. The PageRank algorithm [29] that calculates the re-spective centrality is used to rank webpages in the Google search engine but also for other scenarios in the commercial sector. It distinguishes the importance of a webpage by performing a random walk on the network created by the linked webpages (i.e., one website referring to another creates a link between them). Initially, a node in the network, or equivalently a webpage, is assigned on unit PR value. Then every node distributes its PR value to its neighbors evenly using its outgoing edges. The PR value for a node v of graph G at a specific step t can be calculated as follows:
n PRu(t – 1)
PRv(t) = avu (2.7)
dout(u)
where n is the total number of nodes in G, u 2 Neighbors(v) and doutG(u) is the out-degree of node u. The algorithm stops when the PR values reach a steady state. Nevertheless, Eq. 2.7 cannot guarantee converge in some cases. An example of such a case is the existence of nodes with zero out-degree that cannot redistribute their PR value. For that reason, a jumping factor has been introduced assuming that the user will visit a web page with probability p, and close the current page and open a random page with probability 1-p.
Table of contents :
1 introduction
1.1 Social Networks and Social Influence
1.2 Social Influence examples
1.3 Social Influence Analysis Applications
1.4 Thesis statement and overview of contributions
1.4.1 Identification of individual influential spreaders
1.4.2 Identification of a group of influential spreaders
1.4.3 Secure and Private computation of influential metrics
1.5 Outline of the thesis
2 basic concept and preliminaries
2.1 Introduction to Graph Theory
2.2 Adjacency Matrix and Eigenvalues
2.3 Node centralities
2.3.1 Structural centralities
2.3.2 Iterative refinement centralities
2.4 Description of Graph Datasets
3 locating influential spreaders in social networks
3.1 Introduction
3.2 Preliminaries and Background
3.2.1 k-core decomposition
3.2.2 K-truss decomposition
3.2.3 Epidemic models
3.2.4 The SIR model applied in networks
3.3 Related work
3.4 K-truss decomposition for identifying influential nodes
3.5 Experimental Evaluation
3.5.1 Datasets and Methodology
3.5.2 Evaluating the spreading performance
3.5.3 Comparison to the optimal spreading
3.5.4 Impact of infection and recovery rate on the spreading process
3.6 Exploration of network centralities in spreading processes
3.6.1 Evaluation of Results
3.7 Conclusions and Future Work
4 influence maximization in social networks
4.1 Introduction
4.2 Preliminaries and Background
4.2.1 The Influence Maximization (IM) problem
4.2.2 Diffusion Models
4.3 Related work
4.4 MATrix Influence (MATI) Algorithm
4.4.1 Influence in Social Networks
4.4.2 Influence Computation under the LT Model
4.4.3 Influence Computation under the IC Model
4.5 Experimental Evaluation
4.5.1 Datasets
4.5.2 Baseline Algorithms
4.5.3 Experimental Results
4.6 Conclusions and Future Work
5 private, secure and distributed computation of k-cores
5.1 Introduction
5.2 Problem Statement and Preliminiaries
5.2.1 Problem Statement
5.2.2 Preliminaries and Background
5.3 Related work
5.3.1 k-core Computation
5.3.2 Core Maintenance
5.3.3 Decentralized Personal Data Management Platforms
5.4 P2P Algorithm for Core Maintenance
5.4.1 Local variables
5.4.2 Handling Messages and Events
5.4.3 Computing coreness estimations
5.4.4 Example
5.5 Analytical and Experimental Study
5.5.1 Analytical study
5.5.2 Complexity: Experimental Study
5.6 Security and Privacy Analysis
5.6.1 Attack Model
5.6.2 Privacy and Information Quality
5.6.3 Experimental results
5.7 Conclusions and Remarks
6 concluding remarks
6.1 Summary of Contributions and Future Work
6.2 Epilogue
bibliography