Skip to content

Added Adler32 to System.IO.Hashing.#123601

Merged
stephentoub merged 19 commits intodotnet:mainfrom
AraHaan:add-adler32-to-system-io-hashing
Feb 13, 2026
Merged

Added Adler32 to System.IO.Hashing.#123601
stephentoub merged 19 commits intodotnet:mainfrom
AraHaan:add-adler32-to-system-io-hashing

Conversation

@AraHaan
Copy link
Member

@AraHaan AraHaan commented Jan 25, 2026

Fixes #90191.

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 25, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 25, 2026
@AraHaan

This comment was marked as outdated.

@akoeplinger akoeplinger added area-System.IO.Hashing and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 25, 2026
@AraHaan

This comment was marked as outdated.

@AraHaan
Copy link
Member Author

AraHaan commented Feb 1, 2026

Let me know if these changes look good as is or if any more changes are needed.

@stephentoub
Copy link
Member

🤖 Copilot Code Review — PR #123601

Holistic Assessment

Motivation: The API is approved (issue #90191 has api-approved label). Adler-32 is a legitimate checksum algorithm specified in RFC 1950, commonly used in zlib. Adding it to System.IO.Hashing fills a gap in the library.

Approach: The implementation correctly follows the Adler-32 algorithm per RFC 1950: initial state of 1, modulo 65521 (largest prime < 65536), and big-endian output. The API shape matches the approved proposal and follows patterns established by sibling types (Crc32, Crc64, XxHash*).

Summary: ⚠️ Needs Human Review. The algorithm implementation is correct, and the PR has addressed key feedback from reviewers (empty buffer handling, endianness, bounds check optimization). However, several items warrant human attention: (1) test vector provenance needs confirmation, (2) the scalar-only implementation may be acceptable for initial merge with vectorization deferred, but this is a policy decision.


Detailed Findings

✅ Algorithm Correctness — Verified correct per RFC 1950

The core algorithm is correctly implemented:

  • Initial state s1=1, s2=0 (combined as 1u)
  • Uses BASE=65521 (largest prime < 65536)
  • Uses NMax=5552 to safely defer modulo operations without 32-bit overflow
  • Accumulates s1 += b; s2 += s1; per byte
  • Returns (s2 << 16) | s1 — correct
  • Uses WriteUInt32BigEndian — matches RFC 1950's "most-significant-byte first (network) order" requirement

(Flagged by: GPT-5.2, Gemini 3 Pro, Claude Opus — consensus)

✅ API Consistency — Matches approved API and sibling type patterns

  • Correctly inherits from NonCryptographicHashAlgorithm
  • [CLSCompliant(false)] on uint return methods matches Crc32/XxHash32
  • Clone(), Hash(), TryHash(), HashToUInt32() methods follow established patterns
  • Ref assembly places Adler32 before Crc32 alphabetically — correct

✅ Empty Input Handling — Fixed

The earlier concern about Update returning 1u for empty buffers has been addressed. The code now correctly returns the current adler state unchanged when buf.IsEmpty, and the base test framework includes AppendingEmptyHasNoEffect which validates this behavior.

⚠️ Test Vectors — Need Oracle Validation

A reviewer asked: "Where did these hashes originate from? Do we have an oracle we can use to validate them?"

The test vectors appear correct (I verified "123456789"0x091E01DE matches RFC 1950's example), but the PR should confirm:

  1. The test vectors were generated using an authoritative source (zlib, Python's zlib.adler32, or similar)
  2. Consider adding a P/Invoke test to zlib for continuous validation, similar to how other crypto tests work

(Flagged by: GPT-5.2)

⚠️ Test Coverage — Consider NMax Boundary Tests

Current tests exercise basic correctness but don't explicitly test the NMax (5552-byte) chunking boundary. Consider adding test vectors with lengths at/around NMax-1, NMax, NMax+1, and 2*NMax to ensure the modulo reduction and chunking logic can't regress.

(Flagged by: GPT-5.2)

💡 Vectorization Deferred — Acceptable for Initial Merge

The implementation is scalar-only, while sibling types like Crc32 have vectorized paths. The PR comments indicate vectorization is planned for a follow-up PR. This is a reasonable approach — merge the correct scalar implementation with comprehensive tests first, then optimize in a subsequent PR.

(Flagged by: GPT-5.2, Gemini 3 Pro — consensus)

💡 Test Comment Cleanup

The test file includes comments referencing "vector optimizations" (lines 52, 58, 63) but the implementation is scalar-only. Consider updating these comments to reference "chunking (NMax) coverage" instead, or leave them as-is if vectorization is imminent.

💡 XML Documentation Style

The implementation has XML documentation, but some methods use slightly different formatting than Crc32.cs (e.g., /// <para> vs /// <para>, multi-line vs compact). This is minor but could be harmonized for consistency.


Summary

Category Finding
✅ Correct Algorithm implementation, API shape, empty input handling, endianness
⚠️ Attention Test vector oracle confirmation, NMax boundary test coverage
💡 Optional Vectorization (deferred OK), test comments, doc formatting

Verdict: The code looks correct and follows library patterns. Recommend merge after confirming test vectors were generated from an authoritative source. Vectorization can follow in a separate PR as planned.


This review was generated using multi-model analysis (GPT-5.2, Gemini 3 Pro, Claude Opus 4.5). Findings flagged by multiple models are noted.

AraHaan and others added 19 commits February 12, 2026 17:22
Co-authored-by: Jeremy Barton <[email protected]>
Also another attempt at fixing the expected hash values as well.
Added another remarks entry.
…c to UpdateScalar(). Also added optional spot where ARM intrinsics can be used in UpdateScalar if vectorization is not possible.
@stephentoub stephentoub force-pushed the add-adler32-to-system-io-hashing branch from f776e19 to 55235a4 Compare February 12, 2026 23:18
@stephentoub stephentoub enabled auto-merge (squash) February 12, 2026 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.IO.Hashing community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: System.IO.Hashing.Adler32 class for computing Adler32 checksums

7 participants