Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the System.IO.Hashing Adler-32 implementation to use hardware intrinsics for vectorized processing (where available) and extends the test suite to validate correctness across a wider set of input shapes.
Changes:
- Add SIMD-accelerated Adler-32 update paths for Arm (AdvSimd) and x86 (Vector128 / AVX2 / AVX-512 BW), with scalar fallback.
- Refactor scalar update to share
ModBase/NMaxconstants and introduce a scalar tail helper for vector paths. - Add new test coverage for many input lengths, max-byte data, and incremental append chunking, validated against a reference Adler-32 implementation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs |
Introduces vectorized Adler-32 update implementations (Arm/x86) with feature detection and scalar fallback. |
src/libraries/System.IO.Hashing/tests/Adler32Tests.cs |
Adds reference-based tests targeting multiple length boundaries, overflow-stressing inputs, and incremental append behavior. |
|
Tagging subscribers to this area: @dotnet/area-system-io-hashing, @bartonjs, @vcsjones |
81edcb0 to
04d2d15
Compare
🤖 Copilot Code Review — PR #124409Holistic AssessmentMotivation: The Adler-32 implementation added in #123601 was scalar-only. Adler-32 is used in zlib/deflate, so vectorization is a well-motivated performance improvement for a hot-path algorithm. The PR is justified. Approach: The implementation follows the well-known approach from zlib/chromium of decomposing the Adler-32 s2 weighted sum into position-weighted sums computed via SIMD multiply-add intrinsics, with a prefix-sum accumulator tracking inter-block s1 contributions. Four vectorized paths are provided (AdvSimd, SSE2/SSSE3, AVX2, AVX-512BW) plus a widening fallback for Vector128 without SSSE3. This matches the pattern used by other vectorized hash implementations in this library (Crc32.Vectorized.cs, Crc64.Vectorized.cs, XxHashShared.cs). Summary: ✅ LGTM. After extensive mathematical verification of all four vectorized paths (tracing the s1/s2 accumulation through multi-block examples), the code is correct. The test coverage is thorough with a reference implementation, and the patterns are consistent with the rest of the codebase. One minor suggestion below. No blocking issues found. Detailed Findings✅ Vectorized s2 Accumulation — Verified correct across all pathsI traced through the prefix-sum logic (the
✅ Vector512 Weight Correction — CorrectThe ✅ Vector128 Non-SSSE3 Fallback — Correct and overflow-safeThe widening fallback sums widened ✅ Endianness Guard — AppropriateThe ✅ AVX2 Lane Independence — Verified
✅ Test Coverage — ThoroughTests cover: (1) various lengths hitting all vector width boundaries and tail transitions, (2) all-0xFF stress testing for accumulator overflow safety, (3) incremental append with varying chunk sizes, all validated against a clean reference implementation that applies modular reduction per byte. Good use of 💡 Missing benchmark dataThe PR description doesn't include benchmark numbers. While the algorithmic improvement is well-established from zlib implementations, having BenchmarkDotNet results would help quantify the speedup for the .NET implementation specifically. This is a nice-to-have for the PR description, not a blocker — the approach is proven. |
|
@EgorBot -amd -intel -arm |
|
On my machine with AVX2... Before:
After:
using System.IO.Hashing;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
[MemoryDiagnoser]
public class Bench
{
private byte[] _bytes;
[Params(1, 10, 1000, 10_000)]
public int Size { get; set; }
[GlobalSetup]
public void Setup()
{
_bytes = new byte[Size];
for (int i = 0; i < Size; i++)
{
_bytes[i] = (byte)('a' + (i % 26));
}
}
[Benchmark]
public uint Adler() => Adler32.HashToUInt32(_bytes);
} |
No description provided.