Hey everyone! I recently passed the NVIDIA Data Science Professional Certification, and I'm thrilled to share some insights to help you on your journey. This is part of a series where I'll break down key concepts and tools covered in the certification, focusing on how to leverage GPU acceleration for blazingly fast machine learning. I have included all the Colab notebooks I used so that you can quickly grasp the concepts by running them instantly on Google Colab. Let’s get started. NetworkX NetworkX is a powerhouse for graph analytics in Python, beloved for its ease of use and vast community. However, as graphs grow, its pure-Python nature can lead to performance bottlenecks. What if you could keep the familiar NetworkX API but get a massive speedup for larger datasets? Enter nx-cugraph, a RAPIDS backend that lets NetworkX leverage the power of NVIDIA GPUs. nx-cugraph This post dives into how nx-cugraph can significantly accelerate your NetworkX workflows, demonstrated with common graph algorithms like Betweenness Centrality and PageRank. nx-cugraph Click, Copy and Run the notebook. Click, Copy and Run the notebook. Link to the Colab Notebook Link to the Colab Notebook What You Will Learn Why NetworkX, despite its popularity, can be slow for large graphs. How NetworkX 3.0+ allows for dispatching algorithms to accelerated backends. What nx-cugraph is and how it brings GPU acceleration to NetworkX. How to set up your environment to use nx-cugraph. See practical examples of speedups for Betweenness Centrality and PageRank algorithms on both small and large datasets. Understand the minimal code changes required to get these performance benefits. Why NetworkX, despite its popularity, can be slow for large graphs. How NetworkX 3.0+ allows for dispatching algorithms to accelerated backends. What nx-cugraph is and how it brings GPU acceleration to NetworkX. nx-cugraph How to set up your environment to use nx-cugraph. nx-cugraph See practical examples of speedups for Betweenness Centrality and PageRank algorithms on both small and large datasets. Understand the minimal code changes required to get these performance benefits. The NetworkX Challenge: Performance at Scale NetworkX is incredibly popular, downloaded millions of times. Its user-friendly API, extensive documentation, and easy installation make it a go-to for graph analysis. However, this ease comes with a trade-off: its Python implementation can struggle with the performance demands of larger, real-world graph datasets. Accelerated NetworkX to the Rescue! NetworkX 3.0 introduced a game-changing feature: the ability to dispatch algorithm calls to alternative, more performant backend implementations. This means you don't have to abandon your existing NetworkX code to tap into serious performance gains, like those offered by GPUs. The nx-cugraph library, part of the NVIDIA RAPIDS ecosystem, is one such backend. It allows NetworkX to offload computations to NVIDIA GPUs, dramatically speeding up graph algorithms. nx-cugraph Configuring NetworkX to Use cuGraph by Default A neat feature of nx-cugraph (version 24.10+) is the NX_CUGRAPH_AUTOCONFIG environment variable. Setting this to True before importing NetworkX tells NetworkX to use the "cugraph" backend by default. nx-cugraph NX_CUGRAPH_AUTOCONFIG True before %env NX_CUGRAPH_AUTOCONFIG=True import networkx as nx print(f"using networkx version {nx.__version__}") # This notebook uses a caching feature that might produce warnings for some users. # The notebook uses recommended APIs, so we can safely ignore this specific warning. nx.config.warnings_to_ignore.add("cache") %env NX_CUGRAPH_AUTOCONFIG=True import networkx as nx print(f"using networkx version {nx.__version__}") # This notebook uses a caching feature that might produce warnings for some users. # The notebook uses recommended APIs, so we can safely ignore this specific warning. nx.config.warnings_to_ignore.add("cache") With this setup, most of your existing NetworkX algorithm calls will automatically be GPU-accelerated without any further code changes! Seeing is Believing: Algorithm Acceleration Let's look at how nx-cugraph speeds up a couple of popular algorithms. nx-cugraph A Simple Start: Zachary's Karate Club We'll begin with the classic Zachary's Karate Club graph (34 nodes, 78 edges). G = nx.karate_club_graph() G.number_of_nodes(), G.number_of_edges() # Output: (34, 78) G = nx.karate_club_graph() G.number_of_nodes(), G.number_of_edges() # Output: (34, 78) Betweenness Centrality This algorithm measures a node's importance based on how many shortest paths pass through it. With nx-cugraph (GPU accelerated, default due to NX_CUGRAPH_AUTOCONFIG): With nx-cugraph NX_CUGRAPH_AUTOCONFIG %%time nxcg_bc_results = nx.betweenness_centrality(G) # CPU times: user 177 ms, sys: 70.1 ms, total: 247 ms # Wall time: 762 ms %%time nxcg_bc_results = nx.betweenness_centrality(G) # CPU times: user 177 ms, sys: 70.1 ms, total: 247 ms # Wall time: 762 ms With default NetworkX (CPU): To explicitly use the original NetworkX implementation, we use the backend="networkx" argument. With default NetworkX (CPU): backend="networkx" %%time nx_bc_results = nx.betweenness_centrality(G, backend="networkx") # CPU times: user 191 ms, sys: 13.6 ms, total: 205 ms # Wall time: 204 ms %%time nx_bc_results = nx.betweenness_centrality(G, backend="networkx") # CPU times: user 191 ms, sys: 13.6 ms, total: 205 ms # Wall time: 204 ms For such a small graph, the overhead of GPU kernel launches might make the nx-cugraph version appear slightly slower. The real power shines with larger datasets. The notebook visualizes these results, showing that both backends produce the same centrality rankings. For such a small graph, the overhead of GPU kernel launches might make the nx-cugraph version appear slightly slower. The real power shines with larger datasets. PageRank PageRank scores nodes based on their relative "importance" by analyzing links. With nx-cugraph (GPU accelerated): nx-cugraph %%time nxcg_pr_results = nx.pagerank(G) # CPU times: user 11.4 ms, sys: 10.8 ms, total: 22.2 ms # Wall time: 68.2 ms %%time nxcg_pr_results = nx.pagerank(G) # CPU times: user 11.4 ms, sys: 10.8 ms, total: 22.2 ms # Wall time: 68.2 ms With default NetworkX (CPU): %%time nx_pr_results = nx.pagerank(G, backend="networkx") # CPU times: user 3.8 ms, sys: 1.11 ms, total: 4.9 ms # Wall time: 19.8 ms %%time nx_pr_results = nx.pagerank(G, backend="networkx") # CPU times: user 3.8 ms, sys: 1.11 ms, total: 4.9 ms # Wall time: 19.8 ms Again, for tiny graphs, CPU can be faster. However, the results are numerically very close, as shown by comparing them in a DataFrame: %load_ext cudf.pandas import pandas as pd import pytest from IPython.display import display, HTML print("Do both results have the same values (within tolerance)? " f"{nxcg_pr_results == pytest.approx(nx_pr_results, rel=1e-6, abs=1e-11)}") # Output: Do both results have the same values (within tolerance)? True df = pd.DataFrame( columns=["nx node", "nxcg node", "nx PR", "nxcg PR"], data=[(a, c, b, d) for (a, b), (c, d) in zip(nx_pr_results.items(), nxcg_pr_results.items())]) df.sort_values(by="nx PR", ascending=False, inplace=True) print("\nTop 5 nodes based on PageRank") display(HTML(df.head(5).to_html(float_format=lambda f: f"{f:.7g}"))) %load_ext cudf.pandas import pandas as pd import pytest from IPython.display import display, HTML print("Do both results have the same values (within tolerance)? " f"{nxcg_pr_results == pytest.approx(nx_pr_results, rel=1e-6, abs=1e-11)}") # Output: Do both results have the same values (within tolerance)? True df = pd.DataFrame( columns=["nx node", "nxcg node", "nx PR", "nxcg PR"], data=[(a, c, b, d) for (a, b), (c, d) in zip(nx_pr_results.items(), nxcg_pr_results.items())]) df.sort_values(by="nx PR", ascending=False, inplace=True) print("\nTop 5 nodes based on PageRank") display(HTML(df.head(5).to_html(float_format=lambda f: f"{f:.7g}"))) The output confirms the PageRank scores are essentially identical. Betweenness Centrality on a Large Graph For large graphs, calculating all-pairs shortest paths for Betweenness Centrality is often infeasible. We use the k parameter to approximate by sampling k nodes. k k With default NetworkX (CPU), k=1 (larger k values are impractical): With default NetworkX (CPU), k=1 k %%time bc_results_large_nx = nx.betweenness_centrality(G_large, k=1, backend="networkx") # CPU times: user 2min 1s, sys: 4.02 s, total: 2min 5s # Wall time: 2min 5s %%time bc_results_large_nx = nx.betweenness_centrality(G_large, k=1, backend="networkx") # CPU times: user 2min 1s, sys: 4.02 s, total: 2min 5s # Wall time: 2min 5s With nx-cugraph (GPU), k=1: nx-cugraph k=1 %%time bc_results_large_nxcg_k1 = nx.betweenness_centrality(G_large, k=1) # CPU times: user 935 ms, sys: 200 ms, total: 1.14 s # Wall time: 1.17 s %%time bc_results_large_nxcg_k1 = nx.betweenness_centrality(G_large, k=1) # CPU times: user 935 ms, sys: 200 ms, total: 1.14 s # Wall time: 1.17 s Over 100x speedup! (2min 5s vs 1.17s) Over 100x speedup! With nx-cugraph, we can afford a much larger (and more accurate) k. With nx-cugraph (GPU), k=100: nx-cugraph k With nx-cugraph k=100 %%time bc_results_large_nxcg_k100 = nx.betweenness_centrality(G_large, k=100) # CPU times: user 26.7 s, sys: 658 ms, total: 27.3 s # Wall time: 27.3 s %%time bc_results_large_nxcg_k100 = nx.betweenness_centrality(G_large, k=100) # CPU times: user 26.7 s, sys: 658 ms, total: 27.3 s # Wall time: 27.3 s Running with k=100 on the GPU is still significantly faster (27.3s) than k=1 on the CPU (2min 5s). k=100 k=1 A note on comparing betweenness_centrality with k: Since it's an approximation based on random samples, results might differ slightly between NetworkX and nx-cugraph unless a common seed and sampling strategy are used, which is an area for future updates. A note on comparing betweenness_centrality k nx-cugraph PageRank on a Large Graph With default NetworkX (CPU): With default NetworkX (CPU): %%time nx_pr_results_large = nx.pagerank(G_large, backend="networkx") # CPU times: user 1min 39s, sys: 5.02 s, total: 1min 44s # Wall time: 1min 44s %%time nx_pr_results_large = nx.pagerank(G_large, backend="networkx") # CPU times: user 1min 39s, sys: 5.02 s, total: 1min 44s # Wall time: 1min 44s With nx-cugraph (GPU): nx-cugraph %%time nxcg_pr_results_large = nx.pagerank(G_large) # CPU times: user 540 ms, sys: 293 ms, total: 834 ms # Wall time: 877 ms %%time nxcg_pr_results_large = nx.pagerank(G_large) # CPU times: user 540 ms, sys: 293 ms, total: 834 ms # Wall time: 877 ms Another massive speedup: over 100x! (1min 44s vs 877ms). The results remain consistent within tolerance. Another massive speedup: over 100x! Key Takeaways for certification ✨ Migrating your NetworkX workflows to GPU acceleration with nx-cugraph offers substantial benefits, especially as your data grows: NetworkX workflows to GPU acceleration with nx-cugraph offers substantial benefits 🚀 Blazing Speed: Experience dramatic performance improvements (often >100x) for graph algorithms on large datasets by leveraging GPU power. 💻 Minimal Code Changes: Thanks to the backend system and NX_CUGRAPH_AUTOCONFIG, you can accelerate existing NetworkX code with little to no modification. 📊 Enhanced Scalability: Tackle much larger, real-world graph problems that were previously impractical with CPU-only NetworkX. 🛠️ Simple Setup: Easy installation via pip and straightforward configuration to enable the cugraph backend. 🤝 Familiar NetworkX API: Continue working with the well-known and loved NetworkX interface, minimizing the learning curve. 🚀 Blazing Speed: Experience dramatic performance improvements (often >100x) for graph algorithms on large datasets by leveraging GPU power. Blazing Speed 💻 Minimal Code Changes: Thanks to the backend system and NX_CUGRAPH_AUTOCONFIG, you can accelerate existing NetworkX code with little to no modification. Minimal Code Changes Thanks to the backend system and NX_CUGRAPH_AUTOCONFIG 📊 Enhanced Scalability: Tackle much larger, real-world graph problems that were previously impractical with CPU-only NetworkX. Enhanced Scalability 🛠️ Simple Setup: Easy installation via pip and straightforward configuration to enable the cugraph backend. Simple Setup pip cugraph 🤝 Familiar NetworkX API: Continue working with the well-known and loved NetworkX interface, minimizing the learning curve. Familiar NetworkX API If you're working with graphs that are pushing the limits of traditional NetworkX, nx-cugraph is a fantastic way to boost your productivity and unlock new possibilities in graph analytics. nx-cugraph