Putting the Biological Species Concept to the Test: Using Mating Networks to Delimit Species

Get Complete Project Material File(s) Now! »

Markov Cluster algorithm (MCL)

The MCL algorithm (Van Dongen, 2000) allows the search for structurally homogeneous subsets by considering a random walk on the graph.
A random walk on the graph is a sequence of moves at discrete time points, from one vertex to another, along graph edges. The probability of a move along an edge is proportional to its weight. Let Eit be the event of being in the set i in time t. For all t, (Eit)i=1···n only depends on (Eit−1)i=1···n, therefore the random walk is a Markov chain. The behavior of a random walk from a starting vertex is determined by the set of probabilities of a move from this vertex to another vertex j in t steps for all (j, t). A vertex is characterized by the behavior of the random walk starting from this vertex. The main idea of this method is to consider that vertices with the same random walk behavior are in the same cluster.
A standard random walk (with no inflation factor) on a connected graph con-verges to the same asymptotic state of the Markov chain for any starting vertex. The objective of the MCL algorithm is to build k clusters with k > 1, so the usual random walk has to be modified to achieve this goal. The idea of the MCL algo-rithm is to constrain the random walk to converge to a diﬀerent state depending on the starting vertex. This is achieved using the inflation operation. The more important the inflation operation is, the more numerous the obtained asymptotic states are. The aim of the MCL algorithm is to group vertices whose associated random walks converge to the same state. Let the transition matrix of the Markov chain be T = (Wsl)(Do)−1 . Tij is Wsl the probability of going from vertex j to vertex i in one step. Therefore T is a column-wise stochastic matrix (∀j, i Tij = 1). Note that in MCL notation, the transition matrix is the transpose of the usual notation for the transition matrix of the Markov chain, which is a row-wise stochastic matrix. T is the matrix of probabilities of transition in one step and T k is the matrix of probabilities of transition in k steps.
Let T (1) = T . MCL alternates two operations indexed by k starting at k = 1:
1. T (2k) = (T (2k−1))ek , is the transition operation which allows the progress of the random walk. The importance of this operation is larger when ek is large.
2. T (2k+1) = Γrk (T (2k)) is the inflation operation which allows the random walk to converge toward several stable states. Γrk is a term by term rk power operator followed by a column sum normalization. This operation inflates the high values of the matrix T (2k) and reduces the small ones. For example, Γ2([.5, .3, .2]) = [.25, .09, .04]/.38 = [.66, .24, .11]. The importance of the inflation operation is larger when rk is large.
The algorithm ends when T (k) is idempotent (T (k+1) = T (k)). Denote T (∞) this idempotent matrix. The columns of T (∞) correspond to the vertices of the graph. Each row of T (∞) defines a cluster. Non-zero values within each row indicate the composition of the cluster. In the general case, several clusters are empty. Therefore there are fewer clusters than vertices. When a vertex belongs to several clusters, diﬀerent aﬀectation rules may be applied.
The Markov chain of the random walk must be ergodic. In particular the Markov Chain must be aperiodic and irreducible. Some graphs must be modified to satisfy this aperiodicity, generally by adding self-loops. The irreducible condition is always satisfied for undirected graphs, but can be not satisfied for directed graphs. For example in a bipartite graph, when all edges connect one type of vertex to another, there is one set of absorbent states, and consequently the Markov Chain is reducible. A directed bipartite graph must be transformed (for example by symmetrization) before using the MCL algorithm.
In any case, applying the MCL method means clustering a graph which is not the true one (because of the addition of self-loops). However the true graph can be approached with a graph with very low weighted self-loops.
Figure 2.5 shows that the MCL algorithm applied to the toy example with uni-tary self-loops, retrieves communities instead of structurally homogeneous subsets. This is because the vertices with diﬀerent structural connectivity behavior have the same structural behavior in the random walk when self-loops are added to the graph. To illustrate this idea, let us imagine the random walk on the graph without self-loops. In this case, the random walk would alternate between the two types.

Hierarchical agglomerative clustering algorithm

This algorithm is useful for clustering graphs once a distance between ver-tices has been defined. Note that we have two graphs, the original one and the weighted graph of dissimilarities. This algorithm gives communities of the graph of dissimilarities, but the clusters obtained can be structurally homogenous sub-sets or communities for the original graph, depending on the distance used in the algorithm. The result depends on the local or global building of the dissimilarities between vertices: for instance, if the dissimilarity between vertices is the Jaccard or Pons-Latapy (see section 2.3.1) dissimilarity measures, then one obtains struc-turally homogeneous subsets. Conversely if the dissimilarity between two vertices equals one when two vertices are not linked and 0 when there is an edge between them, then one obtains communities. The usual hierarchical agglomerative algo-rithms are well known (Hartigan, 1975): Ward, single linkage, complete linkage or UPGMA (Unweighted Pair Group Method with Arithmetic Mean). A clas-sification algorithm such as a hierarchical agglomerative algorithm or a k-means method are the necessary final step for some methods that only compute a distance between vertices or continuous latent positions for vertices. Therefore the results of these methods depend not only on their own tuning parameter but also on the peculiar classification algorithm used to cluster the vertices.

READ On non-Euclidean Newtonian theories and their global backreaction

Methods developed with a model

Statisticians propose probabilistic models that are supposed to take into ac-count the random variability in the data. These models are generative models in the sense that they mimic the real data generation. This section presents a synthetic summary of the more detailed review made in Daudin (2011).
All of these models use latent variables. These latent variables may be discrete and give directly the classification of the vertices such as in the Stochastic Block Model (SBM, Section 2.5.3). Alternative models such as the Model-Based Cluster-ing for Social Networks (MBCSN, Section 2.5.1) and Random Dot Product Graph (RDPG, Section 2.5.2) use continuous latent variables; therefore a supplementary step of classification is necessary.
The above models assume that a vertex pertains to only one class, but there are alternative models such as the Continuous Stochastic Block Model (CSBM, section 2.5.4), which allows each vertex to pertain to several classes, these models are also known under the name of Grade of Membership (see Manton et al., 1994 and Erosheva, 2005).

Random Dot Product Graphs (RDPG)

The multidimensional scaling (MS) method, applied to the similarity matrix P , consists in positioning each vertex in a metric space of latent variables so that the similarity between vertices is approximately kept. The underlying model is P = T T ′ , where the (n, d)-matrix T contains the coordinates of the vertices in a d-dimensional metric space. The naive MS method is not well suited for modeling P, with two major drawbacks: T T ′ does not lie in [0, 1]n2 if T ∈ Rd and T T ′ is symmetric so it is not suited for the modeling of directed graphs.
The Random Dot Product Graph defined in Marchette and Priebe (2008) is Pij = f (t′itj ) with ti ∈ Rd and f (x) ∈ [0, 1]. f is a simple threshold in Marchette and Priebe (2008): f (x) = 0 if x < 0, f (x) = x if 0 ≤ x ≤ 1 and f (x) = 1 if x > 1.
To get around the second drawback, the RDPG model is extended with two vectors for each vertex, an in-vector V and an out-vector U , such that the model be-comes Pij = f (u′i.vj ). Another way to get around the symmetry of P , called DEDI-COM, was proposed by Harshman (1978) and well described in Trendafilov (2002). This model uses only one vector for each vertex but inserts a non-symmetric (d, d)-matrix A in the dot product. The model is X =TAT′+E the matrix T is constrained by T ′ T = I and T and A are obtained by mini-mizing X − T AT ′ 2. Several algorithms have been proposed to achieve this task (see Kiers et al., 1990). Tuning parameters are d the dimension of the latent space, and the number of groups.

Table of contents :

1 Introduction
1.1 Networks in ecology
1.1.1 Ecological networks topology
1.1.2 Ecological networks dynamics
1.2 Statistical methods for networks
1.3 Problematic and outline
2 Detection of structurally homogeneous subsets in graphs
2.1 Introduction
2.2 Basic notations and an example
2.2.1 A toy example
2.2.2 Transformation of the raw graph
2.3 Methods based on an algorithm
2.3.1 Markov Cluster algorithm (MCL)
2.3.2 Hierarchical agglomerative clustering algorithm
2.3.3 Spectral Clustering
2.3.4 Edge-Betweenness
2.4 Methods based on an optimization criterion
2.4.1 Modularity criterion
2.4.2 Cut
2.5 Methods developed with a model
2.5.1 Model-based clustering for social networks (MBCSN)
2.5.2 Random Dot Product Graphs (RDPG)
2.5.3 Stochastic Block Model (SBM)
2.5.4 Continuous Stochastic Block Model (CSBM)
2.6 Application to the Zachary’s karate club
2.7 Conclusion
3 Graph clustering methods differ in their ability to detect patterns in bipartite ecological networks
3.1 Introduction
3.2 Materials and methods
3.2.1 Graph clustering methods
3.2.2 Simulation of ecological bipartite networks
3.2.3 Criteria used to compare the efficiency of the clustering methods
3.3 Results
3.3.1 Relative efficiency of the clustering methods when applied to ecological bipartite networks with average properties
3.3.2 Effect of network properties on the hierarchy between clustering methods
3.4 Discussion
4 Wmixnet: Software for Clustering the Nodes of Binary and Valued Graphs using the Stochastic Block Model
4.1 Introduction
4.2 SBM model with covariates
4.2.1 Notations
4.2.2 The model
4.2.3 Probability laws
4.2.4 Analysis of groups when covariates are used
4.3 Estimation method
4.3.1 Initialization
4.3.2 Smoothing
4.3.3 Parallelism
4.4 wmixnet program
4.4.1 Sources availability and installation
4.4.2 Input format
4.4.3 Output format
4.4.4 Command line usage
4.4.5 Empirical complexity
4.4.6 Capacity of extension
4.5 Example
4.5.1 Projected networks
4.5.2 Covariate data
4.5.3 Example of command line
4.5.4 Results
5 Deciphering the mechanisms shaping ecological networks: a framework and a method
5.1 Introduction
5.2 Materials and methods
5.2.1 The network data
5.2.2 The statistical model
5.2.3 Application of the model to the data
5.3 Results
5.3.1 Relative importance of the three mechanisms
5.4 Ongoing work and discussion
A Putting the Biological Species Concept to the Test: Using Mating Networks to Delimit Species