Skip to content

Investigate RAPIDS 26.02 string-ID graph-build regression with pure cuDF/cuGraph repro #977

@lmeyerov

Description

@lmeyerov

Summary

We found a GPU performance regression when upgrading the official RAPIDS base image from 25.02-cuda12.8 to 26.02-cuda13 on the to_cugraph() / from_cudf_edgelist() path used by our GFQL benchmarks.

The important update is that this now reproduces with a pure cuDF/cuGraph script and no PyGraphistry imports.

Where we first saw it naturally

In the PyGraphistry GFQL gplus GPU benchmark, 25.02-cuda12.8 -> 26.02-cuda13 regressed on the warm path:

  • pipeline_total: 1.7019s -> 2.1780s (+27.97%)
  • pagerank stage: 0.3240s -> 0.5108s (+57.65%)

Direct to_cugraph()+pagerank timing on the same workload was worse:

  • total: 0.8171s -> 1.3695s (+67.60%)
  • build: 0.7630s -> 1.3215s (+73.20%)
  • pagerank: 0.0537s -> 0.0487s (-9.31%)

That suggested graph build / renumber, not the PageRank kernel.

Pure RAPIDS reproducer

Artifact:

  • plans/gfql-rapids-gpu-regression-optimization/repro/pure_rapids_string_build_repro.py

Primary repro shape:

  • synthetic_string_gplus_shape
  • 10,000,000 edges
  • 107,614 unique vertices
  • repeated low-cardinality string/object IDs

Commands:

docker run --gpus all --rm -v "$WORKTREE:/workspace" -w /workspace \
  nvcr.io/nvidia/rapidsai/base:25.02-cuda12.8-py3.12 \
  python -W ignore plans/gfql-rapids-gpu-regression-optimization/repro/pure_rapids_string_build_repro.py \
    --graph synthetic_string_gplus_shape --synthetic-edges 10000000 --runs 3 --warmup 1

docker run --gpus all --rm -v "$WORKTREE:/workspace" -w /workspace \
  nvcr.io/nvidia/rapidsai/base:26.02-cuda13-py3.13 \
  python -W ignore plans/gfql-rapids-gpu-regression-optimization/repro/pure_rapids_string_build_repro.py \
    --graph synthetic_string_gplus_shape --synthetic-edges 10000000 --runs 3 --warmup 1

Results:

  • 25.02-cuda12.8
    • build: 0.1861s
    • pagerank: 0.0073s
    • total: 0.1936s
  • 26.02-cuda13
    • build: 0.3148s
    • pagerank: 0.0034s
    • total: 0.3188s

Delta:

  • build: +69.16%
  • total: +64.67%
  • kernel is faster on 26.02

Controls

Sparse integer control, synthetic_offset, 10,000,000 edges:

  • 25.02-cuda12.8: 0.2248s
  • 26.02-cuda13: 0.2187s
  • delta: -2.71%

Sparse string control with high-cardinality domain, synthetic_string_offset, 10,000,000 edges:

  • 25.02-cuda12.8: 0.6649s
  • 26.02-cuda13: 0.8054s
  • delta: +21.13%

So:

  • this is not an integer sparse-ID problem
  • string/object IDs are implicated
  • repeated low-cardinality string/object IDs are the strongest repro shape

Why filing here first

This now looks upstream-facing, but the original natural failure was in our benchmark suite and our wrapper is where the investigation started. Filing here first gives us a stable local record and a place to decide whether to escalate directly to RAPIDS/cuGraph.

Requested outcome

  1. Confirm whether we want to escalate this pure-RAPIDS repro upstream to RAPIDS/cuGraph.
  2. If yes, use this script plus the natural GFQL benchmark evidence as the filing package.
  3. If no, document what local mitigation we do or do not want in PyGraphistry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions