Estimating the robustness of a bipartite ecological networks through a probabilistic modeling

Get Complete Project Material File(s) Now! »

Application to the multilevel network issued from a television programs trade fair

We apply our model to the data set (Brailly et al., 2016) described below.

Context and Description of the data set

Promoshow East is a television programs trade fair for Eastern Europe. Sellers from Western Europe and the USA come to sell audiovisual products to regional and local buyers such as broadcasting companies. The data gather observations on one particular audiovisual product, namely animation and cartoons. From a sociological perspective, reconstituting and analyzing multilevel (inter-individual and inter-organizational) networks in this industry is important. In economic sociology, it helps redefine the nature of markets (Brailly et al., 2016, 2017; Lazega and Mounier, 2002). In the sociology of culture, it helps understand, from a structural perspective, the mechanisms underlying contemporary globalization and standardization of culture (Brailly et al., 2016; Favre et al., 2016). In the sociology of organizations and collective action, it helps understand the importance of multilevel relational infrastructures for the management of tense competition and cooperation dilemmas by various categories of actors (Lazega, 2020), in this case the (sophisticated) sales representatives of cultural industries.
The data were collected by face-to-face interviews. At the individual level, people were asked to select from a list the individuals from which they obtain advice or information during or before the trade fair. The level consists of 128 individuals and 710 directed interactions (density = 0.044). The individuals were aﬃliated to 109 organizations, each one containing from one to six individuals. At the inter-organizational level, two kinds of interactions were collected: a deal network (deals signed since the last trade fair) and a meeting network (derived from the aggregation at the inter-organizational level of the meetings planned by individuals on the trade fair’s website). Both networks are symmetric with respective densities 0.067 and 0.059.

Application to the multilevel network issued from a television programs trade fair

Statistical analysis

The MLVSBM is inferred on the two datasets (one dataset corresponding to the deal network at the inter-organizational level, the other dataset to the meeting network at the inter-organizational level). In both cases the ICL criterion favors dependence between the two levels and chooses QcI = 4 blocks of individuals. QdO is equal to 3 for the deal network and 4 for the meeting network.
In order to determine which is the most relevant inter-organizational network, we test the ability of the MLVSBM to predict dyads or links in the inter-individual network when the deal or the meeting networks are considered. To do so, we choose uniformly dyads and links to remove and try to predict them. More precisely, we set XiiI0 = NA for a certain percentage of (i, i0) (this percentage ranging from 5% to 40% by step-size of 5%). We also propose to remove existing links (ie. forcing XiiI0 = 0 when XiiI0 = 1 was observed, for some randomly chosen (i, i0)). The percentage of removed existing links varies from 5% to 95% (with step-size of 5%). We repeat the following procedure 100 times:
1. Remove dyads or links uniformly at random
2. Infer the newly obtained network from scratch in order to obtain the probability of a link P(XiiI0 = 1; θb) for each missing dyad or for each dyad such that XiiI0 = 0
3. Predict link among all missing dyads or among all dyads such that XiiI0 = 0.
Missing data are handled as Missing At Random (Tabouy et al., 2019) and the
probability of existence of an edge is given by: P(XiiI0 = 1; θb) = Pk,k0 τcikIαdklIτdiI0k0. Since the result of our procedure is equivalent to a binary classification problem, we
assess the performance through the area under the ROC curve (AUC) (a random classification corresponding to AUC = 0.5).
Figure 2.6 shows that using the MLVSBM compared to a single level SBM improves a lot the recovery of the inter-individual level for this dataset. This confirms the dependence between levels detected by the ICL. Moreover, using the deal network gives better predictions for both missing dyads and missing links than the meeting network. We also considered a merged network at the inter-organizational level by making the union of links of the deal and the meeting network, i.e. for all j, j0 ∈ nO, XjjO,0merged = max{XjjO,0deal, XjjO,0meeting}. The improvement in terms of prediction over the deal network is not very significant and this composite network is much harder to analyze sociologically.
Remark. Another way to simulate missing data is to consider actor non-response like in (Žnidaršič et al., 2012). In our case, it corresponds to selecting a portion of the individuals at random and putting all their out-going dyads to NA (i.e. XiiI0 = NA for all i0 if individual i did not respond). Then we look at the stability of the clustering as in Žnidaršič et al. (2012, 2019) (the ARI between the clustering of the individuals with the full data and the one with the missing data). By doing so, we notice in simulations (not reported here) that the clustering of the individuals is more stable when considering an MLVSBM on (XI , XO,deal) than when considering a unilevel
Figure 2.6 – AUC of the prediction for A: missing dyads, B: missing links, in function of the missing proportion for the inter-individual level. Colors represent diﬀerent network at the inter-organizational level. None (beige) is equivalent to a single layer SBM on the individuals. The confidence interval is given by mean ± stderror.
SBM on XI . This is one more clue in favor of the dependence between the two levels.
Remark. Žiberna (2019) and Žiberna (2020) also deals with this dataset from Brailly et al. (2016). However, Žiberna (2019) uses the dataset collected in 2012 and Žiberna (2020) gathers the datasets collected in 2011 and 2012 while we only use the 2011 dataset. Moreover, diﬀerent choices were made on the individuals and organizations to include or not. Thus, a direct comparison does not make sense. Applying Žiberna’s method on the dataset we consider provides us with clusterings that somewhat agree on both levels (ARIs>0.6). We have checked that the diﬀerence derives from the fact that the two methods do not seek the same patterns.
Figure 2.7 – Multilevel network of the Promoshow East trade fair 2011. Above: the deal network for the organizations and below: the advice network for the individuals. A: Mesoscopic view of the multilevel network. Nodes stand for the blocks, donut charts show the relation between Z O and Z I . Black edges are the probabilities of connection α I and α O I O O , blue edges stand for P(Xii0 = 1|ZAi , ZAi0 ), i.e.
the probability of interaction between organizations through their individuals. For sake of clarity only edges with probabilities above the density are shown. B: View of the network. The size of a node is proportional to its in-degree. Colors represent the clustering obtained with the MLVSBM. C: Matrix representation of the multilevel network. At the bottom-left, the adjacency matrix of the advice network between individuals, at the top-right, the deal network between organizations, at the top-left, the aﬃliation matrix of the individuals to the organizations. Entries are reordered by block from left to right and bottom to top. Blocks are separated by thin lines and levels by thick lines. The entries of the bottom-right matrix are the parameters αI , γ and αO multiplied by 100.

READ Controllable predecessors and their greatest fixpoints

Analysis and comments

For the analysis, we use the MLVSBM inferred from the deal network. We select QdO = 3 and QcI = 4 blocks and the ICL is in favor of a dependence between the two levels. This network is plotted in Figure 2.7 B and we reordered the adjacency matrices of both levels by blocks in Figure 2.7 C. In Figure 2.7 A, we plot a synthetic view of the blocks of this multilevel network. The size of each node is proportional to the cardinal of each block. For the inter-organizational level, we link blocks of organizations by αO (plain black edges) and by the probability of interactions of their individuals P(Xii0 = 1|ZAi , ZAi0 ) (gradual blue edges). The donut charts around the nodes is the parameter γ. For the inter-individual level, blocks of individuals are linked by αI and the donut chart for a given block is the apportionment of each block of organizations in the individuals’ aﬃliation.

Table of contents :

Chapter 1: Introduction
1.1 Motivation
1.2 État de l’art
1.2.1 Formulation mathématique de réseaux et réseaux multicouches
1.2.2 Quelques modèles probabilistes pour graphes aléatoires
1.2.3 Modèles à espaces latents pour graphes aléatoires
1.2.4 Extensions du SBM aux réseaux multicouches et à des collections de réseaux
1.2.5 Techniques d’inférence et algorithmes
1.2.6 Sélection de modèle
1.2.7 Comparaison de clustering
1.2.8 Données manquantes et bruitées
1.3 Contributions de la thèse
1.3.1 Un modèle à blocs stochastiques pour les réseaux multiniveaux
1.3.2 Structures communes d’une collection de réseau
1.3.3 Estimation de la robustesse de réseaux d’interactions écologiques bipartites
1.3.4 Package R
Chapter 2: A Stochastic Block Model for the Analysis of Multilevel
2.1 Introduction
2.2 A multilevel stochastic block model
2.3 Statistical Inference
2.3.1 Variational method for maximum likelihood estimation
2.3.2 Model selection
2.4 Illustration on simulated data
2.4.1 Experimental design
2.4.2 Simulation results
2.4.3 Computational costs
2.5 Application to the multilevel network issued from a television programs trade fair
2.5.1 Context and Description of the data set
2.5.2 Statistical analysis
2.5.3 Analysis and comments
2.6 Discussion
2.A Proof of Proposition 2.1
2.B Proof of Proposition 2.2
2.C Details of the Variational EM
2.D Details of the ICL criterion
2.E Stochastic Block Model for Generalized Multilevel Network
2.E.1 Description of the generative model
2.E.2 Variational inference
2.F MLVSBM package Tutorial
2.F.1 Generic functions
2.F.2 Other useful output
2.G Hard to Infer Levels: Benefits of the Multilevel Modeling
2.G.1 Simulation Scenario
2.G.2 Results
Chapter 3: Joint inference of a collection of networks using a stochastic block model framework
3.1 Introduction
3.2 Data Motivation and the Stochastic Block Model
3.3 Joint Modeling of a Collection of Networks
3.3.1 A collection of i.i.d. SBM
3.3.2 A collection of networks with varying block sizes
3.3.3 A collection of networks with varying density (-colSBM)
3.3.4 Collection of networks with varying block sizes and density (-colSBM)
3.4 Likelihood and identifiability of the models
3.4.1 Log-likelihood expression
3.5 Variational estimation of the parameters
3.6 Model selection
3.6.1 Selecting the number of blocks Q
3.6.2 Testing common connectivity structure
3.7 Partition of networks according to their mesoscale structure
3.8 Simulation studies
3.8.1 Efficiency of the inference procedure
3.8.2 Capacity to distinguish -colSBM from iid-colSBM
3.8.3 Partition of networks
3.8.4 Finding finer block structures
3.9 Application to Food Webs
3.9.1 Joint analysis of 3 stream food webs
3.9.2 Partition of a collection of 67 predation networks
3.10 Discussion
3.A Proof of identifiability
3.B Details of the Model Selection when Allowing for Empty Blocks
3.C Partition of Food Webs with colSBM
3.D Analyze of advice networks
3.D.1 Presentation of the advice networks
3.D.2 Pairwise analysis of the advice networks
3.D.3 Looking for larger collections
3.D.4 Using dyad prediction to quantify the link between networks
3.D.5 Conclusion
Chapter 4: Estimating the robustness of a bipartite ecological networks through a probabilistic modeling
4.1 Introduction
4.2 Robustness of bipartite ecological networks
4.3 Bipartite Stochastic Block Model and related sequential extinctions
4.3.1 Probabilistic model on bipartite ecological networks
4.3.2 Extinction sequence distributions adapted to bipartite Block Models
4.4 Moments of the robustness statistic
4.4.1 Expectation
4.4.2 Variance
4.4.3 Illustration of the variability of the robustness function
4.5 Impact of the Network Structure on the Robustness
4.5.1 Analytical Properties
4.5.2 Analysis for Typical Structures
4.6 Analysis of a collection of observed bipartite ecological networks
4.6.1 Computation of the Robustness for the Web of Life Dataset
4.6.2 Correction for Partially Observed Networks
4.7 Discussion
4.A Proof for Section 4.4 (Moments of the robustness statistic)
4.B Comparing the robustness of bipartite ecological networks with different interaction types
4.B.1 Robustness and Analysis of the block model of the Web of Life dataset
4.B.2 Analyzing the link between richness, connectance and interaction type for empirical robustness
4.B.3 Normalized robustness
4.B.4 Examining if networks are more robust to plant or animal extinctions
Chapter 5: Conclusions et perspectives