Get Complete Project Material File(s) Now! »
Contributions Dynamic Processes
The first part of the thesis concerns the analysis of infinite dynamic processes by means of carefully crafted potentials and the analysis of the potential through (general) random walks. We consider two processes: Forest Fire Process and Balls-into-Bins with Deletions.
Definition Forest Fire Process. The Forest Fire Process is a model for generating random graphs representing the evolution of social networks. The nodes in the generated graph represent the users and the each edges represents friendship between the adjacent nodes.
At every time step a user arrives and connects to another user chosen uniformly at random among those already present. The new user then executes a simple recursive process to determine their connections i. e., their neighborhood. Formally, the Forest Fire Process is defined iteratively, starting from a seed graph G0. Let Gt−1 = (Vt−1, Et−1) denote the graph at the end of round t − 1. In round t, a new node ut arrives, and chooses a node amb(ut) ∈ Vt−1 uniformly at random, where we call the node amb(ut) the ambassador of the new node ut. After selecting the ambassador, we burn the ambassador, meaning we add the edge (ut, amb(ut)) to the graph. The graph generation process then continues as follows. First choose a random subset of the edges of Gt−1 as active edges: every edge (u, v) of Gt−1 is active independently with probability min{1, α }, where α is a parameter of deg+(u) the model and deg+(u) is u’s out-degree.
Secondly, add an edge to all vertices of Gt−1, reachable from amb(ut) by a path consisting of active edges. This construction of Gt can be obtained by executing Algorithm 1 and Algorithm 2.
We show, under mild assumptions, that if the parameter α is a large enough constant, then this models exhibits indeed the small world effect, i. e., the expected distance between two users is constant. Conversely, if α is below some constant, then the expected distance is of order Ω(log n).
Algorithm 1: Forest Fire Process (G0)
for t = 1, 2, . . . do
upon arrival of node ut at time t:
amb(ut) ← a node chosen u.a.r. from Vt−1
S ← Burn(Gt−1, amb(ut))
Gt ← (Vt−1 ∪ {ut}, Et−1 ∪ {(ut, w) : w ∈ S})
Algorithm 2: Burn(G = (V, E), v) // Outputs a subset of V reachable from v
H ← ∅
for all (w, x) ∈ E do 1, degG+t (w)
with probability min
α
H ← H ∪ {(w, x)}
return {x ∈ V : there exists a directed path from v to x in H}
Definition Balls-into-Bins with Deletions. The Balls-into-Bins with Deletions pro-cess (see Chapter 5) is a fundamental process modeling, among other things, the load distribution in distributed systems. The system consists of n bins and balls which arrive over time according to two strategies: Greedy[1] and Greedy[2]. The process works as follows.
At each time step, we generate up to n balls, each with probability λ < 0. Each of the spawned balls (i) chooses the target bin uniformly at random (Greedy[1]) or (ii) chooses the target bin greedily among two bins chosen uniformly at random (Greedy[2]). To model the load of real system more realistically, we extend the model by adding the following ingredient. At the end of each round, all non-empty bins delete one ball each modeling a finished tasks. In the following we state an algorithm summarizing the above.
The algorithm is executed by each of the n generators.
Algorithm 3: Greedy[d] for arrival rate λ, d ∈ [1, 2]
Spawn a ball w.p. λ < 0.
if a spawned is spaaned then
for all choice i ∈ [d] do
samplei = Uniform(bin1, bin2, . . . , binn)
Move to bin with smallest load among sample1, Sample2, . . . , Sampled
We show that both protocols are positive recurrent and that there is an exponential load difference between both protocols.
Analysis. At first glance, the Forest Fire Process and the Balls-into-Bins with Dele-tions process appear to be completely unrelated. The fabric connecting these processes is our analysis. We design custom-tailored potential functions for the Forest Fire Process, Greedy[1], and Greedy[2]. Stating the precise definitions of the potentials would overstrain this overview (and is therefore left to Chapter 4 and Chapter 5) but their intuition is simple. We show that, when-ever any of our potentials surpasses a certain threshold, then it decreases in expectation– regardless of the current state. Furthermore, the distribution of the potential change has an exponential tail-bound, i. e., the probability to increase the potential by k is 2−Ω(k). This allows us to model the potential via a general random walk on the real numbers. See Figure 3.1 for an illustration.
It is worth mentioning that the potentials do not decrease (in expectation) in every case.
In fact, when the potential is close to zero it increases in expectation.
Since both processes are infinite, it does happen–albeit very rarely–that the quantities of interest (e. g., the maximum load) attain a value which is a function of the time t. Our potential approach shows that this is a “rare” state and whenever the system is in such a state, it quickly recovers.
The potentials are designed in such a way that the following holds. Whenever the potentials are large at a given time step t, we simply assume that system is in the worst-case state and show that the potentials decrease in expectation. This technique proves to be very useful for the problems we study since both problems are of infinite nature.
The analysis of such potentials dates back to Hajek [Haj82] (Theorem A.11) and less general versions have successfully applied to various areas and notably to evolutionary algorithms and drift theory [BFG03, DG13, PR99].
In order to harness the aforementioned general approach for the analysis of both prob-lems, a few problem specific enhancement are required: For the Forest Fire Process we were unable to find a potential that decrease in expectation in a single time step, regardless of the current state. Instead, we consider the potential drop over two consecutive time steps: The node arriving in the first time step t + 1 is likely to have a very “favorable” neighborhood, such that the node arriving at time step t+2 causes the potential to decrease. Our potential might be of general interest and refer the reader to Chapter 4 for an in-depth discussion.
As for the Balls-into-Bins process, we use three different potentials: Potential Φ(t) mea-suring the load difference to average load over time, potential Ψ(t) counting the total load in the system, and potential (t) which interweaves both potentials and allows us use to prove positive recurrence. Because of the strong dependencies between the potential, we start by focusing solely on Φ(t) and we obtain bounds for arbitrary time steps (possibly super-exponential in the number of bins). This allows to apply the general approach described above and gives a weak bound on the maximum load at arbitrary time steps.
To derive a stronger bound we combine the bounds on Φ with combinatorial arguments. To do so use union bounds in an adaptive way to get rough bounds on the potentials at all time steps up to time step t. We then show that that there must have been a step t − poly(n) where Ψ was very small. Together with our rough bounds on Φ, we are able to establish strong bounds on the maximum load at time t. Finally, we are able to analyze the third potential by means of the general approach: We reduce it to a general random walk with drift towards zero whenever it’s large and apply Hajek’s Theorem. Using we show positive recurrence of the underlying Markov chain. Potential of the approach. It seems that the infinite nature (of the processes we consid-ered) renders many standard approaches futile: for example, the approach of relying solely on invariants which could fail over the course of time. On the other hand, the concepts developed for Markov chains such as positive recurrence, and drift theory capture the notion of “recovery”, making them very useful in the study of infinite processes. Using these tech-niques reduces the analysis to finding suitable potentials encapsulating the key-properties of the underlying problem. We are optimistic that this general potential approach, as well as our ideas used to craft and analyze our potentials, carry over to other dynamic processes.
Social Networks: The Forest Fire Model [KLL+16]
Ten years ago, Leskovec, Kleinberg and Faloutsos introduced the Forest Fire model, a generative model to understand the dynamics of social networks over a long period [LKF07]. They examined real-world networks such as the ArXiv Citation Graph, the Patents Citation Graph, the Autonomous Systems Graph, Affiliation Graphs, the Email Network, the IMDB Actors-to-Movies Network, and a Product Recommendation Network. They observed that these social networks become denser over time. They also made the surprising observation that the effective diameter of the networks “shrinks » over time, instead of growing, as was previously thought. They suggested the Forest Fire model as an attempt to explain densification, shrinking diameter, and heavy-tailed distributions of vertex indegrees and outdegrees.
In this model, the evolution initially starts with a fixed seed graph. Time is discrete and at each time t a node ut arrives, picks a random node, w, in the current graph as its “ambassador » and connects to it. The ambassador is considered burned and all other nodes are considered unburnt. Node ut then generates two random numbers x and y and selects x outgoing edges from w and y in-coming edges to w incident to nodes that have not yet been burned. If not enough outgoing or incoming edges are available, ut selects as many as it can. Let w1, w2, …, wx+y denote the other endpoints of the edges selected. ut connects to w1, w2, …, wx+y, marks them as burned, and then applies the previous step recursively to each wi. See Figure 4.1 for an illustration. Leskovec et al. observed through simulation, that the Forest Fire Model appears to have the shrinking diameter property, but leave open the question of providing a rigorous proof:
“Rigorous analysis of the Forest Fire model appears to be quite difficult. However in simulations we find that […] we can produce graphs that […] have diameter that decrease. »
Figure 4.1: A social network generated by the Forest Fire Process. The node sizes and the colors are a function of the degrees and the node labels correspond to the arrival times.
Formal Definition
Formally, the Forest Fire Process is defined iteratively, starting from a seed graph G0. Let Gt−1 = (Vt−1, Et−1) denote the graph at the end of round t − 1. In round t, a new node ut arrives, and chooses a node amb(ut) ∈ Vt−1 uniformly at random, where we call the node amb(ut) the ambassador of the new node ut. After selecting the ambassador, we burn the ambassador, i. e., we add the edge (ut, amb(ut)) to the graph. This then propagates as follows.
First choose a random subset of the edges of Gt−1 as active edges: every edge (u, v) of Gt−1 is active independently with probability min{1, α }, where α is a parameter of deg+(u) the model. Second, burn all vertices of Gt−1, reachable from amb(ut) by following directed active edges. Third, add an edge from ut to every burnt vertex. This construction of Gt can be obtained by executing Algorithms 4 and 5. Although, it is more natural to view burning as a branching process in which we consider the burnt nodes “layer by layer”, we describe the process as a percolation process1 in order to avoid the need to define a specific order for the burning process: In a branching process, a node w could appear on several levels however we allow w to only be burnt once2 and thus the order in which we burn nodes affects the random subtree of burnt nodes.
1A percolation process is a process in which every edge of the graph is present independently with a fixed probability.
Algorithm 4: Forest Fire Process (G0)
for t = 1, 2, . . . do
upon arrival of node ut at time t:
amb(ut) ← a node chosen u.a.r. from Vt−1
S ← Burn(Gt−1, amb(ut))
Gt ← (Vt−1 ∪ {ut}, Et−1 ∪ {(ut, w) : w ∈ S})
Algorithm 5: Burn(G = (V, E), v) // Outputs a subset of V reachable from v
H ← ∅
for all (w, x) ∈ E do 1, degG+t (w)
with probability min
α
H ← H ∪ {(w, x)}
return {x ∈ V : there exists a directed path from v to x in H}
Results
We now state our two main results for the Forest Fire model. The parameters α and the input graph G0 are fixed and we study the asymptotic properties of the graph Gt. We have not optimised the constants in the theorem statements and expect them to be far from being tight.
Theorem 4.1. Let α ≥ 100 and let G0 be a directed cycle such that |G0| ≥ α20, the Forest Fire Process with parameters α and G0 has the property of non-increasing distance to G0, i. e., for every t, E[ distGt (u, G0) ] = O(1), where the expectation is over a node u, which is chosen uniformly at random in Gt, and dist(u, G0) is the directed distance.3
Remark 4.2. It is not critical that G0 is a cycle. The main requirement is that conditioned on the Burn Process reaching G0, a large enough constant number of vertices of G0 will be burnt. For example, G0 being an expander, clique, or a strongly connected graph with large girth suffices. Simulations seem to indicate that G0 being a single node also result in a similar behaviour.
Theorem 4.3. Let α ≤ 1/(4e) and let G0 be an arbitrary graph, the Forest Fire Process with parameters α and G0 is such that E[ distGt (u, G0) ] = Ω(log t), using the same notation as above.
Approach and Technical Contributions
The main idea is as follows. We first reduce the process to the line process in which the node at time t connects to the node which arrived at time t − 1 (see Section 4.4). In this line process we define a potential φ(vt) which measures, intuitively speaking, the “typical path length” of node vt arriving at time step t to the initial graph L0. We defer the formal definition to Section 4.5. We would like to argue that no matter what happens up to time t, dist(vt+1, L0) is less than dist(vt, L0) in expectation whenever dist(vt, L0) is large enough. This does not seem to be possible when using distance directly; we can construct graphs where this is not true. However, these graphs are unlikely to arise under the Line Fire Process. Analysing φ instead gets around this issue. In fact, assuming φ(v2t) > 2, we show that φ(v2t+2) − φ(v2t) has negative expectation—irrespective of the history up to time 2t. The potential is designed such that v2t+1 sets up a favourable situation such that v2t+2 is able to decrease the potential w.r.t. to the value φ((v2t). The crucial part is that φ(vt) is designed such that it dominates dist(vt+1, L0) and thus assuming we can bound E[ φ(v) ] = 0, we get, by triangle inequality, that for u, v chosen uniformly at random E[ dist(u, v) ] ≤ E[ dist(u, L0) ] + E[ dist(v, L0) ] ≤ E[ φ(u) ] + E[ φ(v) ] = O(1).
Note that even though the edges added are directed, we treat the graph as undirected when we consider the distance of nodes.
Related work
There is a extensive variety of models for generating graphs of social networks, each re-producing a subset of properties observed in real-world social networks. The first major line of research considers static graphs, where the number of nodes does not change over the course of time: For example, in small-world like models, there is a fixed underlying graph which is augmented by additional links between the vertices. Kleinberg proposed a particular random augmentation of links on the grid and proved that this gives rise to a decentralised greedy algorithms to find short paths among nodes [Kle01]. In a more recent paper, Chaintreau et al. proposed a different model, in which similar results are achieved, where the grid is augmented with links generated by random walks on the grid with occasional resets [CFL08].
Other static models focus mainly on reproducing both densification and small diameter simultaneously. One example is the model by Leskovec et al. which uses a matrix-operation, namely, the Kronecker product, to generate self-similar graphs recursively [LCKF05]. They reproduce a vast number of properties including heavy tails for the in- and out-degree distribution and small diameter. However, the deterministic nature of this model produces unrealistic features. To remedy this drawback, they propose the Stochastic Kronecker Graph (SKG) model which has been very successful and is widely used in simulations. One disadvantage of SKG is that the adjustment of the parameters may have a huge influence on the properties of the resulting graphs. Recently, Seshadhri et al. [SPK12] showed that in fact the SKG model bears resemblance to a variant of the Chung and Lu model [CL03] which generalises classical random graph models. Here for any given collection of n weights (w1, w2, . . . , wn), the probability of an edge (i, j) is given by wiwj(Pk wk).
Additionally, Pinar et al. [PKC13] introduce the Block Two-Level Erdős Rény (BTER) model, and demonstrate that it captures observable properties of many real-world social networks. Given a degree sequence, the model works in two stages: In the first stage the nodes of roughly the same degree are grouped into clusters and the edges in each cluster are generated by ER (Erdős Rény) graphs for a given using another input parameter. Finally, the “excess” edges of the node i, i. e., the edges not yet used up by edges in the same cluster, are generated by randomly choosing two endpoints proportional to the excess edges of the nodes. Resulting self-loops and multi-edges are discarded.
The second major research line considers graph evolving over time where at each time step new vertices and edges are added to the evolving graph. Barabási et al. proposed the so called preferential attachment model in which new vertices attach preferentially to vertices with high degree, reproducing the power-law distribution over the in-degrees [BA99]. Building on preferential attachment, Cooper and Frieze propose a model in which exhibits a power-law of the degree as well as a shrinking diameter and densification; unfortunately, it involves many parameters [CF03]. Roughly speaking, the graph at time Gt is generated as follows. With some probability a new node is added with one or more edges to Gt−1 and with the remaining probability an already existing vertex is selected and exta edges are added to it. Recently, Avin et al. extended the preferential attachment model to incorporate densification [ALNP15]: Similarly as in [CF03] either a new node arrives or new edges are added. In either case, the nodes are chosen according to preferential attachment. Krapivsky and Redner investigated the development of random networks as the attachment probability grows [KR01].
The authors of [KKR+99, KRR+00] consider an edge copying evolution in which, on arrival, a new vertex picks an existing node and copies a subset of its neighbours. Another model is the Community Guided Attachment model, in which there is a hierarchical back-bone structure that determines the linkage probabilities [LKF07]. Lattanzi and Sivakumar generate random graphs according to an underlying affiliation network: Each node picks a random subset of affiliations and in each affiliation the nodes are connected as a clique (ad-ditionally, there is a process of preferential attachment) [LS09]. They show that this model exhibits shrinking diameter, densification, and a heavy-tailed degree distribution. Moreover, they connected the densification of the network to the non-linearity of the core. The recur-sive search model proposed by Vazquez is quite similar to the Forest-Fire model [Vaz01]. In the recursive search model, vertices are added to the graph one by one; when a new vertex arrives it first connects to a random vertex and then recursively connects to a subset of its unvisited neighbours. The main difference is that in the Forest Fire model, the average number of neighbours visited out of the current node is constant, where as in the recursive search model this is a constant fraction. Thus, presence of high-degree nodes can make the two models quite different.
In the Random-Surfer Model (RSM), introduced by Blum et al. [BCR06], the nodes arrive one by one. Upon arrival, each node performs several random walks from random starting points and connects to the endpoints of the performed walks. Our Random Walk Process (RWP) share resemblance to the RSM. The main difference is that in the RWP a new node connects to all the visited nodes in the random walk (instead of just the endpoint). Chebolu and Melsted [CM08] proved that the RSM and the PageRank-based selection model, proposed by Pandurangan et al. [PRU06], are equivalent and also proved that the expected in-degree of vertices follows a powerlaw distribution. More recently, Mehrabian and Wormald [MW14] proved logarithmic upper bounds for the diameter in the RSM and the PageRank-based selection model as well as a logarithmic lower bound for a special case where the generated graph is a tree.
The only rigorous work thus far on the Forest Fire model is by Mehrabian [MW14] who provide a logarithmic upper bound to the diameter of the Forest Fire model as well as for other well known models, e.g., the copying model and the PageRank-based selection model.
Table of contents :
1 Introduction
1.1 Introduction française
1.2 Organization and Publications
2 General Notation
I Probabilistic Analysis of Distributed Dynamic Processes
3 Contributions Dynamic Processes
4 Social Networks: The Forest Fire Model [KLL+16]
4.1 Results
4.2 Approach and Technical Contributions
4.3 Related work
4.4 Relating graph and line process
4.4.1 Line process
4.4.2 The ambassador graph
4.4.3 Coupling
4.4.4 Proofs of the Graph results from the Line results – Proof of Theorem
4.1 and Theorem 4.3
4.5 Analysis of the Line Fire Process
4.5.1 High-Level Proof Overview
4.5.2 Proof of Lemma 4.4
4.5.3 Proof of Lemma 4.13
4.5.4 Proof of Proposition 4.14
4.5.5 The foundations of
4.6 Lower bound – Proof of Lemma 4.5
4.7 Future Work and Conclusion
4.7.1 Further models
4.7.2 Conclusion
5 Balls-into-Bins with Deleting Bins [BFK+16a]
5.1 Results
5.2 Approach and Technical Contributions
5.3 Related Work
5.4 Model & Preliminaries
5.5 The 1-Choice Process
5.5.1 Maximum Load – Proof of Theorem 5.2
5.5.2 Stability – Proof of Theorem 5.1
5.5.3 Lower Bound on Maximum Load – Proof of Theorem 5.3
5.6 The 2-Choice Process
5.6.1 Bounding the Smoothness – Proof of Proposition 5.7
5.6.2 Maximum Load – Proof of Theorem 5.6
5.6.3 Stability – Proof of Theorem 5.5
6 Future Work – Dynamic Processes
II Probabilistic Analysis of Consensus Dynamics and Protocols
7 Contributions Consensus Processes
7.1 Consensus Dynamics
7.1.1 Overview of Results
7.2 Consensus Protocols
7.2.1 Overview of Results
7.3 Related Work
7.3.1 Voter Model
7.3.2 2-Choices
7.3.3 3-Majority
7.3.4 Further Consensus Dynamics
7.3.5 Consensus Protocols
8 Voter Model [KMS16] 98
8.1 Results
8.2 Approach and Technical Contributions
8.3 Bounding the coalescence time
8.3.1 The coalescence process
8.3.2 A more amenable process
8.3.3 Meeting Time Distribution Prior to tmeet
8.3.4 Upper Bound – Proof of Theorem 8.1
8.3.5 Lower Bound – Proof of Theorem 8.2
9 3-Majority [BCE+17] 131
9.1 Results
9.2 Approach and Technical Contributions
9.3 Consensus Model & Technical Framework
9.3.1 Comparing Anonymous Consensus Processes
9.3.2 Coupling two AC-Processes – Proof of Theorem 9.4
9.4 Upper Bound for 3-Majority- Proof of Theorem 9.8
9.4.1 Analysis of Phase 1: 3-Majority vs. Voter
9.4.2 Analysis of Phase 1: A Bound for Voter
9.5 Limitations of 1-Step Coupling
10 2-Choices [EFK+16, BCE+17]
10.1 Results
10.2 Approach and Technical Contributions
10.3 Plurality Consensus with Two Choices
10.3.1 Upper bound – Proof of Theorem 10.2
10.4 Lower Bound for 2-Choices
10.4.1 Worst-Case Lower bound – Theorem 10.5
10.4.2 Complementing our Upper bounds Theorem 10.6 and Theorem 10.7
10.5 Comparison with the 3-Majority Process
11 Rapid Asynchronous Consensus [EFK+16]
11.1 Results
11.2 Approach and Technical Contributions
11.3 The Asynchronous Protocol
11.4 The Key Lemmas
11.5 Concentration of the Clocks: Proof of Proposition 11.3
11.6 Analysis of the 2-Choices sub-phase: Proof of Proposition 11.4
11.7 Analysis of the Bit-Propagation Sub-Phase: Proof of Proposition 11.5
11.8 The Endgame: Taking a from (1 − « Part1) · n to n
11.9 Putting Everything Together: Proof of Theorem 11.1
11.10Increasing the Number of Opinions
11.11Conclusions and Further Work
12 Consensus via Load Balancing [BFK+16b]
12.1 Results
12.2 Approach and Technical Contributions
12.3 Communication Model and Notation
12.4 Protocol Shuffle – Theorem 12.1
12.4.1 Protocol Description
12.4.2 Analysis of Shuffle
12.5 Protocol Balance – Theorem 12.9
13 Future Work – Distributed Consensus Processes
Bibliography