Get Complete Project Material File(s) Now! »
Community-focused rankings
In the final experiments (on undirected and undirected weighted graphs), the focus is turned on authors belonging to specific scientific communities and compare their rankings according to our fractional cores method against rankings determined using simpler measures of collaborativeness. More precisely, the names of program committee members of SIGMOD, SIGIR, and SIGKDD for the years 2009, 2010, and 2011 were extracted to obtain subsets of the database, information retrieval, and data mining community, respectively. Most of the authors could be mapped automatically to their entries in DBLP using string matching; for some a best-effort manual mapping (e.g., because of missing middle initials or nicknames in the programme committee lists) had to be performed; about a handful of authors could not be mapped with confidence (author name disambiguation issues e.g.from abbreviations) and are thus missing from the rankings. For each community, authors are ranked therein according to the following measures:
(a) fractional index
(b) number of co-authors
(c) number of publications
(d) average number of co-authors per publication
(e) years active.
The resulting top-10 rankings are given in Table 3.8, Table 3.9, and Table 3.10. Note that for the fractional cores method, as before, an author’s fractional index is determined on the entire DBLP co-authorship graph and not only based on collaborations with authors within the same scientific community. When looking at the top-10 rankings presented, its observed that across all communities rankings according to (b), (c), and (e) are biased in favor of senior authors (e.g., Michael Stonebraker, W. Bruce Croft, and Jiawei Han) and overlap sometimes significantly. This is natural, given that authors who have been active longer, tend to have more publications, co-authored with different people at different points in time.
Directed Graph Degeneracy for Scale-Free Graphs
Real world web graphs have been found to display scale-free characteristics[9, 10, 51] evident by the power law degree distribution. Here are also explored author citation graphs which share the same properties (as it can been seen in their degree distributions). Scale-free graphs are frequently modeled by the combination of growth with preferential attachment. There have been many variations in this modeling both for directed and undirected cases but the main idea is that the graph grows one vertex at the time and edges are added (between vertices that may be new or old). The key idea in the preferential attachment scheme is that the probability of taking an edge is proportional to the respective degrees of its endpoints. This intuitively matches with the mechanism of the evolution of both web graphs and citations graphs of authors (i.e. a “popular” page is more likely to get in-links and a “famous” author is more likely to get a citation from a new page/paper following the “rich get richer” of the preferential attachment process). As the scale-free model seems to approximate the graphs examined, it has been chosen for evaluation with the D-core computation procedure to see if the results are similar for both various parameters and as well parameters that produce graphs with approximately similar degree distributions with the real world graphs.
Preliminaries for preferential attachments
Barabasi and Albert in [9] were the first to introduce a scale-free model for undirected graphs. In that model, the graph is generated with a small number of initial vertices m0 and grows by adding each time a new vertex with m(6 m0) edges from the new vertex to the old ones. Preferential attachment is introduced in the selection of the old nodes; the probability a vertex i depends on the degree of that vertex, so that (ki) = ki/Pj kj where ki the degree of the vertex. The Barabasi- Albert model was examined in more detail by Bollobás et. al in [17] and in[78] where a detailed model called Linearized Chord Diagram (LCD) was designed. This applies to directed and undirected graphs as well; a parameter m is used and if m = 1 then at each step t a new vertex vt is added to a given graph G (t−1) 1 with a single edge between vt and vi where i is chosen randomly with.
Table of contents :
1 introduction
1.1 Introduction
1.2 Network Communities
1.2.1 Applications
1.2.2 Collaboration
1.2.3 Degeneracy
1.2.4 Beyond Simple Graphs
1.2.5 Extending Degeneracy
1.2.6 Degeneracy and Clustering
1.2.7 Explored Data
1.2.8 Dissertation Organization
i theoretical models
2 community evaluation
2.1 Introduction
2.2 Related Work
2.2.1 Graph Theoretic Metrics
2.2.2 Citation Graphs
2.3 Theory on Degeneracy
2.3.1 Preliminaries
2.3.2 Cores for bipartite graphs
2.3.3 Fractional k-cores for edge-weighted graphs
2.4 D-cores
2.4.1 Degeneracy of digraphs
2.4.2 D-core matrix
2.4.3 An Example
2.4.4 Digraph Degeneracy Frontiers
2.4.5 Digraph Collaboration indices
2.4.6 Set frontiers and indices
2.4.7 Core Decomposition Forests
2.5 S-cores and Reciprocity
2.5.1 Introduction
2.5.2 S-cores
2.5.3 Reciprocity in Signed Graphs
2.5.4 Conclusion
ii data exploration
3 experiments
3.1 Introduction
3.2 Undirected and Weighted
3.2.1 Dataset Description
3.2.2 k-cores in co-authorship graphs
3.2.3 Fractional cores on the weighted graph
3.2.4 Rank vs size
3.2.5 Hop-1 lists
3.2.6 Community-focused rankings
3.2.7 Core Decomposition Forest on DBLP and ARXIV
3.3 Real World Application
3.3.1 System Architecture
3.3.2 Demonstration
3.3.3 Application Scenarios
3.4 D-cores
3.4.1 Directed Graph Degeneracy for Scale-Free Graphs
3.4.2 Data sets description
3.4.3 Algorithms complexity
3.4.4 Experimental methodology
3.4.5 Experimental results on Wikipedia
3.4.6 Experimental results on DBLP
3.4.7 Experimetnal results on ARXIV
3.4.8 Conclusions
3.5 S-cores
3.5.1 Datasets Description and Methodology
3.5.2 WikiSigned methodology
3.5.3 Experimental Evaluation
3.5.4 Slashdot and Epinion graphs
3.5.5 Wikipedia topics
3.5.6 Local vs. Global reciprocity
3.5.7 General Trends of Graph Reciprocity
3.5.8 Author Frontiers
3.5.9 S-core reciprocity vs clustering structure
3.6 Conclusions
iii degeneracy and clustering
4 scaling graph clustering with the k- core expansion sequence
4.1 Introduction
4.2 Related work
4.3 The Proposed Method
4.3.1 The Framework
4.3.2 Selection procedure
4.3.3 Expected time
4.3.4 Quality of the CoreCluster framework
4.4 Experimental Evaluation
4.4.1 Spectral algorithm
4.4.2 Datasets description
4.4.3 Time performance
4.4.4 Clustering quality evaluation
4.4.5 Degeneracy features vs. running time
4.4.6 Conclusion
5 epilogue
5.1 Conclusions
5.2 Future Directions on Graph Mining and
Degeneracy
bibliography