Investigate RAPIDS 26.02 string-ID graph-build regression with pure cuDF/cuGraph repro

## Summary

We found a GPU performance regression when upgrading the official RAPIDS base image from `25.02-cuda12.8` to `26.02-cuda13` on the `to_cugraph()` / `from_cudf_edgelist()` path used by our GFQL benchmarks.

The important update is that this now reproduces with a **pure cuDF/cuGraph script** and **no PyGraphistry imports**.

## Where we first saw it naturally

In the PyGraphistry GFQL `gplus` GPU benchmark, `25.02-cuda12.8 -> 26.02-cuda13` regressed on the warm path:

- `pipeline_total`: `1.7019s -> 2.1780s` (`+27.97%`)
- `pagerank stage`: `0.3240s -> 0.5108s` (`+57.65%`)

Direct `to_cugraph()+pagerank` timing on the same workload was worse:

- `total`: `0.8171s -> 1.3695s` (`+67.60%`)
- `build`: `0.7630s -> 1.3215s` (`+73.20%`)
- `pagerank`: `0.0537s -> 0.0487s` (`-9.31%`)

That suggested graph build / renumber, not the PageRank kernel.

## Pure RAPIDS reproducer

Artifact:

- `plans/gfql-rapids-gpu-regression-optimization/repro/pure_rapids_string_build_repro.py`

Primary repro shape:

- `synthetic_string_gplus_shape`
- `10,000,000` edges
- `107,614` unique vertices
- repeated low-cardinality string/object IDs

Commands:

```bash
docker run --gpus all --rm -v "$WORKTREE:/workspace" -w /workspace \
  nvcr.io/nvidia/rapidsai/base:25.02-cuda12.8-py3.12 \
  python -W ignore plans/gfql-rapids-gpu-regression-optimization/repro/pure_rapids_string_build_repro.py \
    --graph synthetic_string_gplus_shape --synthetic-edges 10000000 --runs 3 --warmup 1

docker run --gpus all --rm -v "$WORKTREE:/workspace" -w /workspace \
  nvcr.io/nvidia/rapidsai/base:26.02-cuda13-py3.13 \
  python -W ignore plans/gfql-rapids-gpu-regression-optimization/repro/pure_rapids_string_build_repro.py \
    --graph synthetic_string_gplus_shape --synthetic-edges 10000000 --runs 3 --warmup 1
```

Results:

- `25.02-cuda12.8`
  - `build`: `0.1861s`
  - `pagerank`: `0.0073s`
  - `total`: `0.1936s`
- `26.02-cuda13`
  - `build`: `0.3148s`
  - `pagerank`: `0.0034s`
  - `total`: `0.3188s`

Delta:

- `build`: `+69.16%`
- `total`: `+64.67%`
- kernel is faster on `26.02`

## Controls

Sparse integer control, `synthetic_offset`, `10,000,000` edges:

- `25.02-cuda12.8`: `0.2248s`
- `26.02-cuda13`: `0.2187s`
- delta: `-2.71%`

Sparse string control with high-cardinality domain, `synthetic_string_offset`, `10,000,000` edges:

- `25.02-cuda12.8`: `0.6649s`
- `26.02-cuda13`: `0.8054s`
- delta: `+21.13%`

So:

- this is not an integer sparse-ID problem
- string/object IDs are implicated
- repeated low-cardinality string/object IDs are the strongest repro shape

## Why filing here first

This now looks upstream-facing, but the original natural failure was in our benchmark suite and our wrapper is where the investigation started. Filing here first gives us a stable local record and a place to decide whether to escalate directly to RAPIDS/cuGraph.

## Requested outcome

1. Confirm whether we want to escalate this pure-RAPIDS repro upstream to RAPIDS/cuGraph.
2. If yes, use this script plus the natural GFQL benchmark evidence as the filing package.
3. If no, document what local mitigation we do or do not want in PyGraphistry.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate RAPIDS 26.02 string-ID graph-build regression with pure cuDF/cuGraph repro #977

Summary

Where we first saw it naturally

Pure RAPIDS reproducer

Controls

Why filing here first

Requested outcome

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate RAPIDS 26.02 string-ID graph-build regression with pure cuDF/cuGraph repro #977

Description

Summary

Where we first saw it naturally

Pure RAPIDS reproducer

Controls

Why filing here first

Requested outcome

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions