-
Notifications
You must be signed in to change notification settings - Fork 224
Description
Summary
We found a GPU performance regression when upgrading the official RAPIDS base image from 25.02-cuda12.8 to 26.02-cuda13 on the to_cugraph() / from_cudf_edgelist() path used by our GFQL benchmarks.
The important update is that this now reproduces with a pure cuDF/cuGraph script and no PyGraphistry imports.
Where we first saw it naturally
In the PyGraphistry GFQL gplus GPU benchmark, 25.02-cuda12.8 -> 26.02-cuda13 regressed on the warm path:
pipeline_total:1.7019s -> 2.1780s(+27.97%)pagerank stage:0.3240s -> 0.5108s(+57.65%)
Direct to_cugraph()+pagerank timing on the same workload was worse:
total:0.8171s -> 1.3695s(+67.60%)build:0.7630s -> 1.3215s(+73.20%)pagerank:0.0537s -> 0.0487s(-9.31%)
That suggested graph build / renumber, not the PageRank kernel.
Pure RAPIDS reproducer
Artifact:
plans/gfql-rapids-gpu-regression-optimization/repro/pure_rapids_string_build_repro.py
Primary repro shape:
synthetic_string_gplus_shape10,000,000edges107,614unique vertices- repeated low-cardinality string/object IDs
Commands:
docker run --gpus all --rm -v "$WORKTREE:/workspace" -w /workspace \
nvcr.io/nvidia/rapidsai/base:25.02-cuda12.8-py3.12 \
python -W ignore plans/gfql-rapids-gpu-regression-optimization/repro/pure_rapids_string_build_repro.py \
--graph synthetic_string_gplus_shape --synthetic-edges 10000000 --runs 3 --warmup 1
docker run --gpus all --rm -v "$WORKTREE:/workspace" -w /workspace \
nvcr.io/nvidia/rapidsai/base:26.02-cuda13-py3.13 \
python -W ignore plans/gfql-rapids-gpu-regression-optimization/repro/pure_rapids_string_build_repro.py \
--graph synthetic_string_gplus_shape --synthetic-edges 10000000 --runs 3 --warmup 1Results:
25.02-cuda12.8build:0.1861spagerank:0.0073stotal:0.1936s
26.02-cuda13build:0.3148spagerank:0.0034stotal:0.3188s
Delta:
build:+69.16%total:+64.67%- kernel is faster on
26.02
Controls
Sparse integer control, synthetic_offset, 10,000,000 edges:
25.02-cuda12.8:0.2248s26.02-cuda13:0.2187s- delta:
-2.71%
Sparse string control with high-cardinality domain, synthetic_string_offset, 10,000,000 edges:
25.02-cuda12.8:0.6649s26.02-cuda13:0.8054s- delta:
+21.13%
So:
- this is not an integer sparse-ID problem
- string/object IDs are implicated
- repeated low-cardinality string/object IDs are the strongest repro shape
Why filing here first
This now looks upstream-facing, but the original natural failure was in our benchmark suite and our wrapper is where the investigation started. Filing here first gives us a stable local record and a place to decide whether to escalate directly to RAPIDS/cuGraph.
Requested outcome
- Confirm whether we want to escalate this pure-RAPIDS repro upstream to RAPIDS/cuGraph.
- If yes, use this script plus the natural GFQL benchmark evidence as the filing package.
- If no, document what local mitigation we do or do not want in PyGraphistry.