Authors: (1) Roya Aliakbarisani, this author contributed equally from Universitat de Barcelona & UBICS (roya_aliakbarisani@ub.edu); (2) Robert Jankowski, this author contributed equally from Universitat de Barcelona & UBICS (robert.jankowski@ub.edu); (3) M. Ángeles Serrano, Universitat de Barcelona, UBICS & ICREA (marian.serrano@ub.edu); (4) Marián Boguñá, Universitat de Barcelona & UBICS (marian.boguna@ub.edu). Authors: Authors: (1) Roya Aliakbarisani, this author contributed equally from Universitat de Barcelona & UBICS (roya_aliakbarisani@ub.edu); (2) Robert Jankowski, this author contributed equally from Universitat de Barcelona & UBICS (robert.jankowski@ub.edu); (3) M. Ángeles Serrano, Universitat de Barcelona, UBICS & ICREA (marian.serrano@ub.edu); (4) Marián Boguñá, Universitat de Barcelona & UBICS (marian.boguna@ub.edu). Table of Links Abstract and 1. Introduction Abstract and 1. Introduction Abstract and 1. Introduction Related work HypNF Model 3.1 HypNF Model 3.2 The S1/H2 model 3.3 Assigning labels to nodes HypNF benchmarking framework Experiments 5.1 Parameter Space 5.2 Machine learning models Results Conclusion, Acknowledgments and Disclosure of Funding, and References Related work Related work Related work Related work HypNF Model 3.1 HypNF Model 3.2 The S1/H2 model 3.3 Assigning labels to nodes HypNF Model HypNF Model HypNF Model 3.1 HypNF Model 3.1 HypNF Model 3.2 The S1/H2 model 3.2 The S1/H2 model 3.3 Assigning labels to nodes 3.3 Assigning labels to nodes HypNF benchmarking framework HypNF benchmarking framework HypNF benchmarking framework HypNF benchmarking framework Experiments 5.1 Parameter Space 5.2 Machine learning models Experiments Experiments 5.1 Parameter Space 5.1 Parameter Space 5.2 Machine learning models 5.2 Machine learning models Results Results Results Results Conclusion, Acknowledgments and Disclosure of Funding, and References Conclusion, Acknowledgments and Disclosure of Funding, and References Conclusion, Acknowledgments and Disclosure of Funding, and References Conclusion, Acknowledgments and Disclosure of Funding, and References A. Empirical validation of HypNF A. Empirical validation of HypNF B. Degree distribution and clustering control in HypNF B. Degree distribution and clustering control in HypNF C. Hyperparameters of the machine learning models C. Hyperparameters of the machine learning models D. Fluctuations in the performance of machine learning models D. Fluctuations in the performance of machine learning models E. Homophily in the synthetic networks E. Homophily in the synthetic networks F. Exploring the parameters’ space F. Exploring the parameters’ space Abstract Graph Neural Networks (GNNs) have excelled in predicting graph properties in various applications ranging from identifying trends in social networks to drug discovery and malware detection. With the abundance of new architectures and increased complexity, GNNs are becoming highly specialized when tested on a few well-known datasets. However, how the performance of GNNs depends on the topological and features properties of graphs is still an open question. In this work, we introduce a comprehensive benchmarking framework for graph machine learning, focusing on the performance of GNNs across varied network structures. Utilizing the geometric soft configuration model in hyperbolic space, we generate synthetic networks with realistic topological properties and node feature vectors. This approach enables us to assess the impact of network properties, such as topology-feature correlation, degree distributions, local density of triangles (or clustering), and homophily, on the effectiveness of different GNN architectures. Our results highlight the dependency of model performance on the interplay between network structure and node features, providing insights for model selection in various scenarios. This study contributes to the field by offering a versatile tool for evaluating GNNs, thereby assisting in developing and selecting suitable models based on specific data characteristics. 1 Introduction Graph Neural Networks (GNNs) [29, 36, 37, 39], derived from Convolutional Neural Networks for graph-structured data, use recursive message passing between nodes and their neighbors. These models leverage graph topology and node-specific features to map nodes into a learnable embedding space. GNNs have evolved to encompass a wide variety of architectures and tasks, ranging from node and graph classifications to link prediction. Despite this growing interest in the development of GNNs, the fundamental issue of homogeneity in benchmarking datasets persists in GNN research, making it challenging to determine the most suitable GNN model for unseen datasets. In addition, since GNNs are data-driven models tailored to specific tasks, there is a potential concern of overfitting new architectures to given datasets, especially when the data have similar structural properties [27]. Thus, a fair comparison between different models in reproducible settings is required. In this work, we propose a comprehensive benchmarking scheme for graph neural networks. Utilizing a Hyperbolic Soft Configuration Network Model with Features (HypNF) [3], we generate synthetic networks with realistic topological properties where node features can be correlated with the network topology. This highly flexible model allows us to evaluate GNNs in depth across various scenarios. Moreover, we suggest using the benchmark as a tool for optimal model selection by analyzing the inherent properties of the real dataset. Although the use of hyperbolic geometry might appear superfluous, it has been demonstrated that it is the simplest method for generating geometric random graphs that uniquely combine several key characteristics: They have power law degree distributions, exhibit small-world properties, and are highly clustered, meaning they have a high density of triangles [6]. These traits closely mirror those observed in real complex networks. Hyp N F We aim to show the crucial factors, including the network’s structural properties and the degrees of correlation between nodes and features—controlled by the framework parameters—that influence the performance of graph machine learning models. Employing the proposed benchmark, we systematically compare the performance of well-known GNNs and evaluate models that solely utilize node features. Our study aims to evaluate machine learning models in two fundamental graph-based tasks: node classification (NC) and link prediction (LP). Here, we highlight the main contributions of our empirical study, which provides insights into the suitability of the various models under different network conditions. It will, thus, benefit applications and the community focused on developing new GNN algorithms. • Our framework generates benchmark networks with tunable levels of topology-feature correlation, homophily, clustering, degree distributions, and average degrees. This approach covers the most important properties of a wide range of real datasets, providing a comprehensive tool for their analysis. The code and the datasets will be publicly available at https: //github.com/networkgeometry/hyperbolic-benchmark-gnn under the MIT License. • GNNs exhibit varying levels of performance fluctuation under a fixed set of parameters. Notably, HGCN [9] shows less robustness compared to GCN [18] when the network’s average degree is low. However, this trend reverses in networks with homogeneous degree distributions. • The stronger the correlation between the network topology and node features, the better GNNs and feature-based models perform in NC and LP. • The hyperbolic-based models, specifically HGCN and HNN [12], achieve the highest AUC scores [25] in LP task. Remarkably, HNN, despite being solely a feature-based method, outperforms traditional graph-based models across various parameters. • In the NC task, where no information about the graph data is available, emphasis should be placed on model interpretability and time complexity. This is crucial since the accuracy of graph-based models tends to be uniformly high, making these factors significant differentiators. 2 Related work With the continuous evolution of graph machine learning, there is a growing necessity to comprehend and evaluate the performance of GNN architectures. In this respect, benchmarking can provide a fair and standardized way to compare different models. The Open Graph Benchmark (OGB) [15] stands as a versatile tool to assess the performance of the GNNs. Yet, its emphasis on a limited range of actual networks indicates that it does not encompass all network characteristics and falls short in terms of parameter manipulation. Consequently, this highlights the necessity for creating benchmarking tools based on synthetic data. Such tools would allow for the assessment of GNNs in a controlled environment and across a more extensive array of network properties [34, 24, 23]. One of them is GraphWorld [27], which is a synthetic network generator utilizing the Stochastic Block Model (SBM) to generate graphs with communities. It employs a parametrized community distribution and an edge probability matrix to randomly assign nodes to clusters and establish connections. Node features are also generated using within-cluster multivariate normal distributions. A fixed edge probability matrix in SBM prevents GraphWorld from faithfully replicating a predefined degree sequence and generating graphs with true power-law distributions. To overcome this limitation, Ref. [38] integrates Graphworld with two other generators: Lancichinetti-Fortunato-Radicchi (LFR) [20] and CABAM (Class-Assortative graphs via the Barabási-Albert Model) [32]. This integration broadens the coverage of the graph space, specifically for the NC task. In this paper, we propose an alternative network generators: a framework based on the geometric soft configuration model. This model’s underlying geometry straightforwardly couples the network topology with node features and labels. This capability enables independent control over the clustering coefficient in both the unipartite network of nodes and the bipartite network of nodes and features, irrespective of the degree distributions of nodes and features (see Fig. 6 in Appendix B). Table 1 presents a comparison between HypNF and several state-of-the-art benchmarking frameworks, highlighting the properties each can control. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. available on arxiv available on arxiv