Table of Links
A. Empirical validation of HypNF
B. Degree distribution and clustering control in HypNF
C. Hyperparameters of the machine learning models
D. Fluctuations in the performance of machine learning models
E. Homophily in the synthetic networks
F. Exploring the parameters’ space
2 Related work
With the continuous evolution of graph machine learning, there is a growing necessity to comprehend and evaluate the performance of GNN architectures. In this respect, benchmarking can provide a fair and standardized way to compare different models. The Open Graph Benchmark (OGB) [15] stands as a versatile tool to assess the performance of the GNNs. Yet, its emphasis on a limited range of actual networks indicates that it does not encompass all network characteristics and falls short in terms of parameter manipulation. Consequently, this highlights the necessity for creating benchmarking tools based on synthetic data. Such tools would allow for the assessment of GNNs in a controlled environment and across a more extensive array of network properties [34, 24, 23]. One of them is GraphWorld [27], which is a synthetic network generator utilizing the Stochastic Block Model (SBM) to generate graphs with communities. It employs a parametrized community distribution and an edge probability matrix to randomly assign nodes to clusters and establish connections. Node features are also generated using within-cluster multivariate normal distributions. A fixed edge probability matrix in SBM prevents GraphWorld from faithfully replicating a predefined degree sequence and generating graphs with true power-law distributions. To overcome this limitation, Ref. [38] integrates Graphworld with two other generators: Lancichinetti-Fortunato-Radicchi (LFR) [20] and CABAM (Class-Assortative graphs via the Barabási-Albert Model) [32]. This integration broadens the coverage of the graph space, specifically for the NC task. In this paper, we propose an alternative network generators: a framework based on the geometric soft configuration model. This model’s underlying geometry straightforwardly couples the network topology with node features and labels. This
capability enables independent control over the clustering coefficient in both the unipartite network of nodes and the bipartite network of nodes and features, irrespective of the degree distributions of nodes and features (see Fig. 6 in Appendix B). Table 1 presents a comparison between HypNF and several state-of-the-art benchmarking frameworks, highlighting the properties each can control.
Authors:
(1) Roya Aliakbarisani, this author contributed equally from Universitat de Barcelona & UBICS ([email protected]);
(2) Robert Jankowski, this author contributed equally from Universitat de Barcelona & UBICS ([email protected]);
(3) M. Ángeles Serrano, Universitat de Barcelona, UBICS & ICREA ([email protected]);
(4) Marián Boguñá, Universitat de Barcelona & UBICS ([email protected]).
This paper is