Vectorize Adler32 by stephentoub · Pull Request #124409 · dotnet/runtime

stephentoub · 2026-02-13T23:30:18Z

No description provided.

Copilot

Pull request overview

This PR updates the System.IO.Hashing Adler-32 implementation to use hardware intrinsics for vectorized processing (where available) and extends the test suite to validate correctness across a wider set of input shapes.

Changes:

Add SIMD-accelerated Adler-32 update paths for Arm (AdvSimd) and x86 (Vector128 / AVX2 / AVX-512 BW), with scalar fallback.
Refactor scalar update to share ModBase / NMax constants and introduce a scalar tail helper for vector paths.
Add new test coverage for many input lengths, max-byte data, and incremental append chunking, validated against a reference Adler-32 implementation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs`	Introduces vectorized Adler-32 update implementations (Arm/x86) with feature detection and scalar fallback.
`src/libraries/System.IO.Hashing/tests/Adler32Tests.cs`	Adds reference-based tests targeting multiple length boundaries, overflow-stressing inputs, and incremental append behavior.

src/libraries/System.IO.Hashing/tests/Adler32Tests.cs

dotnet-policy-service · 2026-02-14T11:32:59Z

Tagging subscribers to this area: @dotnet/area-system-io-hashing, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

stephentoub · 2026-02-14T13:45:17Z

🤖 Copilot Code Review — PR #124409

Holistic Assessment

Motivation: The Adler-32 implementation added in #123601 was scalar-only. Adler-32 is used in zlib/deflate, so vectorization is a well-motivated performance improvement for a hot-path algorithm. The PR is justified.

Approach: The implementation follows the well-known approach from zlib/chromium of decomposing the Adler-32 s2 weighted sum into position-weighted sums computed via SIMD multiply-add intrinsics, with a prefix-sum accumulator tracking inter-block s1 contributions. Four vectorized paths are provided (AdvSimd, SSE2/SSSE3, AVX2, AVX-512BW) plus a widening fallback for Vector128 without SSSE3. This matches the pattern used by other vectorized hash implementations in this library (Crc32.Vectorized.cs, Crc64.Vectorized.cs, XxHashShared.cs).

Summary: ✅ LGTM. After extensive mathematical verification of all four vectorized paths (tracing the s1/s2 accumulation through multi-block examples), the code is correct. The test coverage is thorough with a reference implementation, and the patterns are consistent with the rest of the codebase. One minor suggestion below. No blocking issues found.

Detailed Findings

✅ Vectorized s2 Accumulation — Verified correct across all paths

I traced through the prefix-sum logic (the vps/vs3 accumulators) for each path with a 2-block example:

Vector128/ARM (vps pattern): vps starts at s1 * blocks, accumulates old vs1 values, then vps << 5 gives the inter-block contribution. vs2 is seeded with s2 (x86) or added via s2 += (ARM). Both produce s2_new = s2_old + BlockSize*blocks*s1 + Σ(BlockSize * prefix_byte_sums) + Σ(weighted_sums). ✅
Vector256/512 (vs3 pattern): vs1 starts at CreateScalar(s1) (not zero), vs3 accumulates old vs1 values including the initial s1. vs3 << 5 (or << 6 for 512) produces the same inter-block contribution. Mathematically equivalent to the vps pattern. ✅

✅ Vector512 Weight Correction — Correct

The weights vector repeats 32..1 for both 256-bit halves. sumLo << 5 adds 32 * sum(lower_32_bytes) to compensate: the lower half needs weights 64..33 but gets 32..1, so adding 32 × byte_sum for each lower byte bridges the gap. The upper half already has the correct weights 32..1 for positions 32..63. ✅

✅ Vector128 Non-SSSE3 Fallback — Correct and overflow-safe

The widening fallback sums widened ushort values from four 8-element vectors. Each ushort lane accumulates at most 4 byte values (max 4 × 255 = 1020), well within ushort range. WeightedSumWidening128 produces int16 products (max 255 × 32 = 8160 < 32767) then widens to int32 for safe accumulation. ✅

✅ Endianness Guard — Appropriate

The BitConverter.IsLittleEndian check before vectorized paths is correct. Vector.LoadUnsafe maps memory bytes to vector lane indices in memory order, but the position-weighted tap vectors assume little-endian lane numbering (byte[0] at lane 0). On big-endian platforms, this mapping may differ, so falling back to scalar is the safe choice. ✅

✅ AVX2 Lane Independence — Verified

Avx2.MultiplyAddAdjacent (VPMADDUBSW) operates on 128-bit lanes independently. The weight vector [32..17 | 16..1] correctly places weights within each lane: lower lane processes bytes 0..15 with weights 32..17, upper lane processes bytes 16..31 with weights 16..1. No cross-lane issue. ✅

✅ Test Coverage — Thorough

Tests cover: (1) various lengths hitting all vector width boundaries and tail transitions, (2) all-0xFF stress testing for accumulator overflow safety, (3) incremental append with varying chunk sizes, all validated against a clean reference implementation that applies modular reduction per byte. Good use of [Theory] with [InlineData]. ✅

💡 Missing benchmark data

The PR description doesn't include benchmark numbers. While the algorithmic improvement is well-established from zlib implementations, having BenchmarkDotNet results would help quantify the speedup for the .NET implementation specifically. This is a nice-to-have for the PR description, not a blocker — the approach is proven.

stephentoub · 2026-02-14T17:07:28Z

@EgorBot -amd -intel -arm

using System.IO.Hashing;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser]
public class Bench
{
    private byte[] _bytes;

    [Params(10, 100, 1000, 10_000, 100_000)]
    public int Size { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _bytes = new byte[Size];
        for (int i = 0; i < Size; i++) _bytes[i] = (byte)('a' + (i % 26));
    }

    [Benchmark]
    public uint Adler() => Adler32.HashToUInt32(_bytes);
}

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs

stephentoub · 2026-02-14T19:26:50Z

On my machine with AVX2...

Before:

Method	Size	Mean
Adler	1	2.685 ns
Adler	10	5.039 ns
Adler	1000	477.781 ns
Adler	10000	4,743.580 ns

After:

Method	Size	Mean
Adler	1	2.924 ns
Adler	10	5.567 ns
Adler	1000	35.518 ns
Adler	10000	280.908 ns

using System.IO.Hashing;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser]
public class Bench
{
    private byte[] _bytes;

    [Params(1, 10, 1000, 10_000)]
    public int Size { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _bytes = new byte[Size];
        for (int i = 0; i < Size; i++)
        {
            _bytes[i] = (byte)('a' + (i % 26));
        }
    }

    [Benchmark]
    public uint Adler() => Adler32.HashToUInt32(_bytes);
}

Copilot AI review requested due to automatic review settings February 13, 2026 23:30

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 13, 2026

dotnet-policy-service bot assigned stephentoub Feb 13, 2026

Copilot started reviewing on behalf of stephentoub February 13, 2026 23:31 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

src/libraries/System.IO.Hashing/tests/Adler32Tests.cs Outdated Show resolved Hide resolved

stephentoub added area-System.IO.Hashing and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 14, 2026

Vectorize Adler32

04d2d15

stephentoub force-pushed the vectorizeadler branch from 81edcb0 to 04d2d15 Compare February 14, 2026 13:07

This was referenced Feb 14, 2026

Unable to pull image from mcr.microsoft.com #117164

Open

iOS.Device test WorkItemExecutions #122874

Open

XHarness package install failure on iOS due to devicectl NSPOSIXErrorDomain error 49 #123796

Open

EgorBot mentioned this pull request Feb 14, 2026

Benchmarks for #124409 (stephentoub) EgorBot/runtime-utils#616

Open

stephentoub marked this pull request as ready for review February 14, 2026 17:08

Copilot AI review requested due to automatic review settings February 14, 2026 17:08

Copilot started reviewing on behalf of stephentoub February 14, 2026 17:09 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs Show resolved Hide resolved

build-analysis bot mentioned this pull request Feb 14, 2026

[wasm] OpenQA.Selenium.WebDriverTimeoutException: timeout: Timed out receiving message from renderer #117486

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize Adler32#124409

Vectorize Adler32#124409
stephentoub wants to merge 1 commit intodotnet:mainfrom
stephentoub:vectorizeadler

stephentoub commented Feb 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

dotnet-policy-service bot commented Feb 14, 2026

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stephentoub commented Feb 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

dotnet-policy-service bot commented Feb 14, 2026

Uh oh!

stephentoub commented Feb 14, 2026

🤖 Copilot Code Review — PR #124409

Holistic Assessment

Detailed Findings

✅ Vectorized s2 Accumulation — Verified correct across all paths

✅ Vector512 Weight Correction — Correct

✅ Vector128 Non-SSSE3 Fallback — Correct and overflow-safe

✅ Endianness Guard — Appropriate

✅ AVX2 Lane Independence — Verified

✅ Test Coverage — Thorough

💡 Missing benchmark data

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant