Skip to content

Better fastbase64 + man pages#947

Open
lemire wants to merge 27 commits intomasterfrom
bettertools
Open

Better fastbase64 + man pages#947
lemire wants to merge 27 commits intomasterfrom
bettertools

Conversation

@lemire
Copy link
Copy Markdown
Member

@lemire lemire commented Mar 13, 2026

This makes fastbase64 somewhat equivalent to standard base64. Unfortunately, the GNU base64 and the non-GNU base64 tools differ. It tried to come up with some kind of hybrid that can emulate the two.

Apple M4 Benchmark Results (times in milliseconds):

Size Encode Base64 Encode FastBase64 Decode Base64 Decode FastBase64
1m 23.5 23.2 37.7 23.1
10m 31.3 24.4 167.6 24.0
100m 115.6 50.4 1430.2 53.7

x64 Linux Benchmark Results (times in milliseconds):

Size Encode Base64 Encode FastBase64 Decode Base64 Decode FastBase64
1m 12.1 11.8 13.8 11.8
10m 25.8 18.7 39.6 17.5
100m 187.0 100.5 290.3 77.7

So for inputs that are 10 MB or larger, it pays to use fastbase64. For small inputs, the time is dominated by the process overhead.

The Apple base64 decoder is quite slow, so that's one use case where we may want to use fastbase64.

Fixes #907

Fixes #908

@lemire lemire requested a review from pauldreik March 13, 2026 21:44
@pauldreik
Copy link
Copy Markdown
Collaborator

I think it would be great to provide a drop in replacement for gnu coreutils.
We could provide a bash script wrapper fastbase64.coreutilscompat that parses command line parameters and uses defaults that exactly matches gnu coreutils.

then, it would be possible for users to either use an alias in .bashrc to get some nice speedup.

the same thing could be done for mac/bsd, I assume?

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 15, 2026

@pauldreik

I think it would be great to provide a drop in replacement for gnu coreutils.

I think that the only difference is the meaning of the flag -i.

We have a few options.

  1. Add an intermediate script as you suggest.
  2. Generate two binaries, one with GNU commands, one with BSD convention.
  3. Focus solely on the GNU convention.
  4. Stick with what I did which is mostly cross-compatible except that Linux users must not use -i.
  5. Detect a GNU (Linux) system and change the behaviour under Linux so that -i means ignore bad characters.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 22, 2026

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 22, 2026

@chadbrewbaker
Copy link
Copy Markdown

I would defer to @sylvestre. The last Rust coreutils PR was for arg[0] handling: uutils/coreutils@6c3de3e

--bsdmode, --gnumode seem like the most sensible flags for a utility that isn't GNU or BSD?

@pauldreik
Copy link
Copy Markdown
Collaborator

I think busybox is also using the trick of checking argv[0] to let the same binary masquerade as different programs.

In the case of providing a symlink with a name of say fastbase64.coreutils which links to fastbase64, argv[0] is "/path/to/fastbase64.coreutils" (depends on how the link is invoked, one needs to do basename on it)

I think some people use gnu coreutils on mac, so it makes sense to provide the .coreutils link not only on linux.

I made a quick test of trying to make a symlink with cmake, using:

install(CODE "execute_process( \
    COMMAND echo ${CMAKE_COMMAND} -E create_symlink \
    ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/fastbase64 \
    ${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/fastbase64.coreutils   \
    )"
)

it does not work properly, I think a generator expression would have been ideal but is does not seem like it is supported: https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html#export-and-install-expressions

there is a potentially useful hint here but I failed understanding how: https://discourse.cmake.org/t/how-to-properly-install-a-symbolic-link-for-backwards-compatibility/8592

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 23, 2026

@pauldreik Please see

#949

I just call ln directly or copy the file as a fallback when ln is not available.

Now, I wonder about the name. Is fastbase64.coreutils what we want to go with?

* This creates two fastbase64 utilities. We have fastbase64 that is meant as a
dropin replacement for the standard BSD/mac util. And fastbase.coreutils which is
meant as a replacement for the GNU coreutils base64 program.

* minor fix for windows

* minor fixes

* minor fix

* another option?

* Update man/fastbase64.coreutils.1

Co-authored-by: Copilot <[email protected]>

* Update tools/CMakeLists.txt

Co-authored-by: Copilot <[email protected]>

* handling gnumode differently.

* Update man/fastbase64.coreutils.1

Co-authored-by: Copilot <[email protected]>

* Update man/fastbase64.coreutils.1

Co-authored-by: Copilot <[email protected]>

* putting back the damn thing.

* another fix

* Update tools/CMakeLists.txt

Co-authored-by: Copilot <[email protected]>

* tuning.

* adding tests

* more testing

* fixing windows

* make script more portable

* more windows fixes

* Update tests/CMakeLists.txt

Co-authored-by: Copilot <[email protected]>

* Update tests/fastbase64/test_fastbase64.py

Co-authored-by: Copilot <[email protected]>

* Update tests/fastbase64/test_fastbase64.py

Co-authored-by: Copilot <[email protected]>

* various additional fixes

* Update tools/fastbase64.cpp

Co-authored-by: Copilot <[email protected]>

* Update man/fastbase64.1

Co-authored-by: Copilot <[email protected]>

* Update man/fastbase64.coreutils.1

Co-authored-by: Copilot <[email protected]>

* Update tools/CMakeLists.txt

Co-authored-by: Copilot <[email protected]>

* Update tools/fastbase64.cpp

Co-authored-by: Copilot <[email protected]>

* tweaks

* forgot to check it in.

* many more fixes

* more tweaking

* more fixes

* Update README.md

Co-authored-by: Copilot <[email protected]>

* Update tests/fastbase64/test_fastbase64.py

Co-authored-by: Copilot <[email protected]>

* more fixing.

* minor fix

* more tests

* tuning.

* ok

* making it prettier.

* more fine tuning.

---------

Co-authored-by: Copilot <[email protected]>
@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 25, 2026

@pauldreik Ok, now the PR does have two utilities.

@lemire lemire requested review from Copilot and pauldreik March 25, 2026 00:38

This comment was marked as resolved.

lemire and others added 2 commits March 24, 2026 20:52
@lemire lemire requested a review from Copilot March 25, 2026 00:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 25, 2026

@pauldreik This should be in pretty decent shape now.

Copy link
Copy Markdown
Collaborator

@pauldreik pauldreik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the added functionality!

when I run the test, I get failures:

tests/fastbase64/test_fastbase64.py build/release/tools/fastbase64 build/release/tools/fastbase64.coreutils recovered_base64_100m.bin 

=== fastbase64 (BSD-like) ===
 PASS --help
 PASS --version
 PASS unknown option rejected
 PASS -n rejected in BSD mode
 FAIL round-trip: -e FILE | -d
 PASS encoded matches expected
 PASS -e / -d explicit flags (binary data)
 PASS --encode / --decode long options
 PASS -D decode alias
 PASS -b 76 wraps at 76 chars
 PASS -w 40 wraps at 40 chars
 PASS --wrap=60 wraps at 60 chars
 PASS default no-wrap: no interior newlines
 PASS base64 alphabet in output
 PASS empty input round-trip
 PASS single-byte round-trip
 PASS all-256-byte-values round-trip
 FAIL -i FILE (explicit input file)
 PASS --input FILE
 FAIL -o FILE (explicit output file)
 FAIL --output FILE
 PASS second positional arg as output file
 PASS invalid base64 input fails
 PASS '-' reads from stdin
 FAIL wrapped encode round-trips correctly
 PASS fastbase64: corrupt input fails without ignore-garbage
 FAIL fastbase64: --ignore-garbage recovers

=== fastbase64.coreutils (GNU-like) ===
 PASS --help
 PASS --version
 PASS unknown option rejected
 FAIL round-trip
 PASS -e / -d explicit flags (binary data)
 PASS --encode / --decode long options
 PASS -D decode alias
 PASS default wrapping: has interior newlines
 PASS base64 alphabet in output
 PASS -w 76 wraps at 76 chars
 PASS -w 0 no-wrap: no interior newlines
 PASS --wrap=40 wraps at 40 chars
 PASS -b 76 is alias for -w 76
 PASS empty input round-trip
 PASS all-256-byte-values round-trip
 PASS corrupt input fails without ignore-garbage
 FAIL -i ignores garbage and decodes correctly
 FAIL --ignore-garbage long option
 FAIL -n ignores garbage and decodes correctly
 FAIL --noerrcheck long option
 PASS -o FILE output
 PASS second positional arg as output file
 FAIL '-' reads from stdin
 FAIL wrapped encode round-trips correctly

=== Large inputs round-trips (over 64K, deterministic) ===
 PASS fastbase64 round-trip 65,536 bytes
 PASS coreutils round-trip 65,536 bytes
 PASS fastbase64 wrapped 65,536 bytes
 PASS coreutils wrapped 65,536 bytes
 PASS large alphabet check 65,536 bytes
 PASS fastbase64 round-trip 65,537 bytes
 PASS coreutils round-trip 65,537 bytes
 PASS fastbase64 wrapped 65,537 bytes
 PASS coreutils wrapped 65,537 bytes
 PASS large alphabet check 65,537 bytes
 PASS fastbase64 round-trip 131,072 bytes
 PASS coreutils round-trip 131,072 bytes
 PASS fastbase64 wrapped 131,072 bytes
 PASS coreutils wrapped 131,072 bytes
 PASS large alphabet check 131,072 bytes
 PASS fastbase64 round-trip 262,144 bytes
 PASS coreutils round-trip 262,144 bytes
 PASS fastbase64 wrapped 262,144 bytes
 PASS coreutils wrapped 262,144 bytes
 PASS large alphabet check 262,144 bytes
 PASS fastbase64 round-trip 524,288 bytes
 PASS coreutils round-trip 524,288 bytes
 PASS fastbase64 wrapped 524,288 bytes
 PASS coreutils wrapped 524,288 bytes
 PASS large alphabet check 524,288 bytes
 PASS fastbase64 round-trip 1,048,576 bytes
 PASS coreutils round-trip 1,048,576 bytes
 PASS fastbase64 wrapped 1,048,576 bytes
 PASS coreutils wrapped 1,048,576 bytes
 PASS large alphabet check 1,048,576 bytes
 PASS fastbase64 round-trip 2,097,152 bytes
 PASS coreutils round-trip 2,097,152 bytes
 PASS fastbase64 wrapped 2,097,152 bytes
 PASS coreutils wrapped 2,097,152 bytes
 PASS large alphabet check 2,097,152 bytes

=== Wrapping extremes & edge options ===
 PASS fastbase64 -w 1 round-trip
 PASS fastbase64 -w 1 line lengths correct
 PASS fastbase64 -w 4 round-trip
 PASS fastbase64 -w 4 line lengths correct
 PASS fastbase64 -w 76 round-trip
 PASS fastbase64 -w 76 line lengths correct
 PASS fastbase64 -w 100000 round-trip
 PASS fastbase64 -w 100000 line lengths correct
 PASS coreutils -w 1 round-trip
 PASS coreutils -w 1 line lengths correct
 PASS coreutils -w 4 round-trip
 PASS coreutils -w 4 line lengths correct
 PASS coreutils -w 76 round-trip
 PASS coreutils -w 76 line lengths correct
 PASS coreutils -w 100000 round-trip
 PASS coreutils -w 100000 line lengths correct

=== Adversarial decoding (padding, garbage, whitespace) ===
 PASS build/release/tools/fastbase64 rejects garbage without ignore
 PASS build/release/tools/fastbase64 ignore-garbage recovers
 PASS build/release/tools/fastbase64.coreutils rejects garbage without ignore
 PASS build/release/tools/fastbase64.coreutils ignore-garbage recovers
 PASS build/release/tools/fastbase64 rejects garbage without ignore
 PASS build/release/tools/fastbase64 ignore-garbage recovers
 PASS build/release/tools/fastbase64.coreutils rejects garbage without ignore
 PASS build/release/tools/fastbase64.coreutils ignore-garbage recovers
 PASS build/release/tools/fastbase64 handles 100KB single-line base64
 PASS build/release/tools/fastbase64.coreutils handles 100KB single-line base64

=== Chunk-boundary whitespace stress ===
 PASS fastbase64: space at chunk boundary
 PASS fastbase64: space one before boundary
 PASS fastbase64: space one after boundary
 PASS fastbase64: newline at chunk boundary
 PASS fastbase64: 10 spaces at boundary
 PASS fastbase64: CRLF at boundary
 PASS fastbase64: spaces straddling boundary -3..+3
 PASS fastbase64: tab at boundary
 PASS fastbase64: mixed ws at boundary
 PASS fastbase64: 50 newlines at boundary
 PASS fastbase64: space at 2×chunk boundary
 PASS fastbase64: newline at 2×chunk boundary
 PASS fastbase64: 10 spaces at 2×chunk boundary
 PASS fastbase64: spaces near 2×chunk -5..+5
 PASS fastbase64: spaces at both chunk boundaries
 PASS fastbase64: space at group boundary before chunk
 PASS fastbase64: space at group boundary after chunk
 PASS fastbase64: 1000 spaces at boundary
 PASS fastbase64: newline every 76 chars
 PASS fastbase64: trailing spaces before final newline
 PASS coreutils: space at chunk boundary
 PASS coreutils: space one before boundary
 PASS coreutils: space one after boundary
 PASS coreutils: newline at chunk boundary
 PASS coreutils: 10 spaces at boundary
 PASS coreutils: CRLF at boundary
 PASS coreutils: spaces straddling boundary -3..+3
 PASS coreutils: tab at boundary
 PASS coreutils: mixed ws at boundary
 PASS coreutils: 50 newlines at boundary
 PASS coreutils: space at 2×chunk boundary
 PASS coreutils: newline at 2×chunk boundary
 PASS coreutils: 10 spaces at 2×chunk boundary
 PASS coreutils: spaces near 2×chunk -5..+5
 PASS coreutils: spaces at both chunk boundaries
 PASS coreutils: space at group boundary before chunk
 PASS coreutils: space at group boundary after chunk
 PASS coreutils: 1000 spaces at boundary
 PASS coreutils: newline every 76 chars
 PASS coreutils: trailing spaces before final newline

=== Super sparse base64 decoding ===
 PASS fastbase64: 0 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
 PASS coreutils: 0 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
 PASS fastbase64: 1 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
 PASS coreutils: 1 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
 PASS fastbase64: 2 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
 PASS coreutils: 2 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
 PASS fastbase64: 3 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
 PASS coreutils: 3 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly

=== Error conditions ===
 PASS fastbase64: non-existent input file rejected
 PASS fastbase64: bad -w -1 rejected
 PASS fastbase64: bad -w abc rejected
 PASS fastbase64: bad -w 999999999999999 rejected
 PASS coreutils: non-existent input file rejected
 PASS coreutils: bad -w -1 rejected
 PASS coreutils: bad -w abc rejected
 PASS coreutils: bad -w 999999999999999 rejected

============================================================
Results: 155 passed, 13 failed


@lemire
Copy link
Copy Markdown
Member Author

lemire commented Mar 29, 2026

@pauldreik The problem was with the python script which I revised.

@lemire lemire requested a review from pauldreik March 29, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make fastbase64 functionally equivalent to the standard base64 command provide man pages for fastbase64 and sutf

4 participants