Conversation
|
I think it would be great to provide a drop in replacement for gnu coreutils. then, it would be possible for users to either use an alias in .bashrc to get some nice speedup. the same thing could be done for mac/bsd, I assume? |
I think that the only difference is the meaning of the flag We have a few options.
|
|
I am asking for advice on X : https://x.com/lemire/status/2035726774919713271?s=61&t=v2gDAuOzz1C3ICrzUYCSlQ |
|
This answer is interesting: https://x.com/austinsnotweird/status/2035730879461658876?s=61&t=v2gDAuOzz1C3ICrzUYCSlQ |
|
I would defer to @sylvestre. The last Rust coreutils PR was for arg[0] handling: uutils/coreutils@6c3de3e --bsdmode, --gnumode seem like the most sensible flags for a utility that isn't GNU or BSD? |
|
I think busybox is also using the trick of checking argv[0] to let the same binary masquerade as different programs. In the case of providing a symlink with a name of say fastbase64.coreutils which links to fastbase64, argv[0] is "/path/to/fastbase64.coreutils" (depends on how the link is invoked, one needs to do basename on it) I think some people use gnu coreutils on mac, so it makes sense to provide the .coreutils link not only on linux. I made a quick test of trying to make a symlink with cmake, using: install(CODE "execute_process( \
COMMAND echo ${CMAKE_COMMAND} -E create_symlink \
${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/fastbase64 \
${CMAKE_INSTALL_PREFIX}/${CMAKE_INSTALL_BINDIR}/fastbase64.coreutils \
)"
)it does not work properly, I think a generator expression would have been ideal but is does not seem like it is supported: https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html#export-and-install-expressions there is a potentially useful hint here but I failed understanding how: https://discourse.cmake.org/t/how-to-properly-install-a-symbolic-link-for-backwards-compatibility/8592 |
|
@pauldreik Please see I just call ln directly or copy the file as a fallback when ln is not available. Now, I wonder about the name. Is |
* This creates two fastbase64 utilities. We have fastbase64 that is meant as a dropin replacement for the standard BSD/mac util. And fastbase.coreutils which is meant as a replacement for the GNU coreutils base64 program. * minor fix for windows * minor fixes * minor fix * another option? * Update man/fastbase64.coreutils.1 Co-authored-by: Copilot <[email protected]> * Update tools/CMakeLists.txt Co-authored-by: Copilot <[email protected]> * handling gnumode differently. * Update man/fastbase64.coreutils.1 Co-authored-by: Copilot <[email protected]> * Update man/fastbase64.coreutils.1 Co-authored-by: Copilot <[email protected]> * putting back the damn thing. * another fix * Update tools/CMakeLists.txt Co-authored-by: Copilot <[email protected]> * tuning. * adding tests * more testing * fixing windows * make script more portable * more windows fixes * Update tests/CMakeLists.txt Co-authored-by: Copilot <[email protected]> * Update tests/fastbase64/test_fastbase64.py Co-authored-by: Copilot <[email protected]> * Update tests/fastbase64/test_fastbase64.py Co-authored-by: Copilot <[email protected]> * various additional fixes * Update tools/fastbase64.cpp Co-authored-by: Copilot <[email protected]> * Update man/fastbase64.1 Co-authored-by: Copilot <[email protected]> * Update man/fastbase64.coreutils.1 Co-authored-by: Copilot <[email protected]> * Update tools/CMakeLists.txt Co-authored-by: Copilot <[email protected]> * Update tools/fastbase64.cpp Co-authored-by: Copilot <[email protected]> * tweaks * forgot to check it in. * many more fixes * more tweaking * more fixes * Update README.md Co-authored-by: Copilot <[email protected]> * Update tests/fastbase64/test_fastbase64.py Co-authored-by: Copilot <[email protected]> * more fixing. * minor fix * more tests * tuning. * ok * making it prettier. * more fine tuning. --------- Co-authored-by: Copilot <[email protected]>
|
@pauldreik Ok, now the PR does have two utilities. |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@pauldreik This should be in pretty decent shape now. |
pauldreik
left a comment
There was a problem hiding this comment.
I like the added functionality!
when I run the test, I get failures:
tests/fastbase64/test_fastbase64.py build/release/tools/fastbase64 build/release/tools/fastbase64.coreutils recovered_base64_100m.bin
=== fastbase64 (BSD-like) ===
PASS --help
PASS --version
PASS unknown option rejected
PASS -n rejected in BSD mode
FAIL round-trip: -e FILE | -d
PASS encoded matches expected
PASS -e / -d explicit flags (binary data)
PASS --encode / --decode long options
PASS -D decode alias
PASS -b 76 wraps at 76 chars
PASS -w 40 wraps at 40 chars
PASS --wrap=60 wraps at 60 chars
PASS default no-wrap: no interior newlines
PASS base64 alphabet in output
PASS empty input round-trip
PASS single-byte round-trip
PASS all-256-byte-values round-trip
FAIL -i FILE (explicit input file)
PASS --input FILE
FAIL -o FILE (explicit output file)
FAIL --output FILE
PASS second positional arg as output file
PASS invalid base64 input fails
PASS '-' reads from stdin
FAIL wrapped encode round-trips correctly
PASS fastbase64: corrupt input fails without ignore-garbage
FAIL fastbase64: --ignore-garbage recovers
=== fastbase64.coreutils (GNU-like) ===
PASS --help
PASS --version
PASS unknown option rejected
FAIL round-trip
PASS -e / -d explicit flags (binary data)
PASS --encode / --decode long options
PASS -D decode alias
PASS default wrapping: has interior newlines
PASS base64 alphabet in output
PASS -w 76 wraps at 76 chars
PASS -w 0 no-wrap: no interior newlines
PASS --wrap=40 wraps at 40 chars
PASS -b 76 is alias for -w 76
PASS empty input round-trip
PASS all-256-byte-values round-trip
PASS corrupt input fails without ignore-garbage
FAIL -i ignores garbage and decodes correctly
FAIL --ignore-garbage long option
FAIL -n ignores garbage and decodes correctly
FAIL --noerrcheck long option
PASS -o FILE output
PASS second positional arg as output file
FAIL '-' reads from stdin
FAIL wrapped encode round-trips correctly
=== Large inputs round-trips (over 64K, deterministic) ===
PASS fastbase64 round-trip 65,536 bytes
PASS coreutils round-trip 65,536 bytes
PASS fastbase64 wrapped 65,536 bytes
PASS coreutils wrapped 65,536 bytes
PASS large alphabet check 65,536 bytes
PASS fastbase64 round-trip 65,537 bytes
PASS coreutils round-trip 65,537 bytes
PASS fastbase64 wrapped 65,537 bytes
PASS coreutils wrapped 65,537 bytes
PASS large alphabet check 65,537 bytes
PASS fastbase64 round-trip 131,072 bytes
PASS coreutils round-trip 131,072 bytes
PASS fastbase64 wrapped 131,072 bytes
PASS coreutils wrapped 131,072 bytes
PASS large alphabet check 131,072 bytes
PASS fastbase64 round-trip 262,144 bytes
PASS coreutils round-trip 262,144 bytes
PASS fastbase64 wrapped 262,144 bytes
PASS coreutils wrapped 262,144 bytes
PASS large alphabet check 262,144 bytes
PASS fastbase64 round-trip 524,288 bytes
PASS coreutils round-trip 524,288 bytes
PASS fastbase64 wrapped 524,288 bytes
PASS coreutils wrapped 524,288 bytes
PASS large alphabet check 524,288 bytes
PASS fastbase64 round-trip 1,048,576 bytes
PASS coreutils round-trip 1,048,576 bytes
PASS fastbase64 wrapped 1,048,576 bytes
PASS coreutils wrapped 1,048,576 bytes
PASS large alphabet check 1,048,576 bytes
PASS fastbase64 round-trip 2,097,152 bytes
PASS coreutils round-trip 2,097,152 bytes
PASS fastbase64 wrapped 2,097,152 bytes
PASS coreutils wrapped 2,097,152 bytes
PASS large alphabet check 2,097,152 bytes
=== Wrapping extremes & edge options ===
PASS fastbase64 -w 1 round-trip
PASS fastbase64 -w 1 line lengths correct
PASS fastbase64 -w 4 round-trip
PASS fastbase64 -w 4 line lengths correct
PASS fastbase64 -w 76 round-trip
PASS fastbase64 -w 76 line lengths correct
PASS fastbase64 -w 100000 round-trip
PASS fastbase64 -w 100000 line lengths correct
PASS coreutils -w 1 round-trip
PASS coreutils -w 1 line lengths correct
PASS coreutils -w 4 round-trip
PASS coreutils -w 4 line lengths correct
PASS coreutils -w 76 round-trip
PASS coreutils -w 76 line lengths correct
PASS coreutils -w 100000 round-trip
PASS coreutils -w 100000 line lengths correct
=== Adversarial decoding (padding, garbage, whitespace) ===
PASS build/release/tools/fastbase64 rejects garbage without ignore
PASS build/release/tools/fastbase64 ignore-garbage recovers
PASS build/release/tools/fastbase64.coreutils rejects garbage without ignore
PASS build/release/tools/fastbase64.coreutils ignore-garbage recovers
PASS build/release/tools/fastbase64 rejects garbage without ignore
PASS build/release/tools/fastbase64 ignore-garbage recovers
PASS build/release/tools/fastbase64.coreutils rejects garbage without ignore
PASS build/release/tools/fastbase64.coreutils ignore-garbage recovers
PASS build/release/tools/fastbase64 handles 100KB single-line base64
PASS build/release/tools/fastbase64.coreutils handles 100KB single-line base64
=== Chunk-boundary whitespace stress ===
PASS fastbase64: space at chunk boundary
PASS fastbase64: space one before boundary
PASS fastbase64: space one after boundary
PASS fastbase64: newline at chunk boundary
PASS fastbase64: 10 spaces at boundary
PASS fastbase64: CRLF at boundary
PASS fastbase64: spaces straddling boundary -3..+3
PASS fastbase64: tab at boundary
PASS fastbase64: mixed ws at boundary
PASS fastbase64: 50 newlines at boundary
PASS fastbase64: space at 2×chunk boundary
PASS fastbase64: newline at 2×chunk boundary
PASS fastbase64: 10 spaces at 2×chunk boundary
PASS fastbase64: spaces near 2×chunk -5..+5
PASS fastbase64: spaces at both chunk boundaries
PASS fastbase64: space at group boundary before chunk
PASS fastbase64: space at group boundary after chunk
PASS fastbase64: 1000 spaces at boundary
PASS fastbase64: newline every 76 chars
PASS fastbase64: trailing spaces before final newline
PASS coreutils: space at chunk boundary
PASS coreutils: space one before boundary
PASS coreutils: space one after boundary
PASS coreutils: newline at chunk boundary
PASS coreutils: 10 spaces at boundary
PASS coreutils: CRLF at boundary
PASS coreutils: spaces straddling boundary -3..+3
PASS coreutils: tab at boundary
PASS coreutils: mixed ws at boundary
PASS coreutils: 50 newlines at boundary
PASS coreutils: space at 2×chunk boundary
PASS coreutils: newline at 2×chunk boundary
PASS coreutils: 10 spaces at 2×chunk boundary
PASS coreutils: spaces near 2×chunk -5..+5
PASS coreutils: spaces at both chunk boundaries
PASS coreutils: space at group boundary before chunk
PASS coreutils: space at group boundary after chunk
PASS coreutils: 1000 spaces at boundary
PASS coreutils: newline every 76 chars
PASS coreutils: trailing spaces before final newline
=== Super sparse base64 decoding ===
PASS fastbase64: 0 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
PASS coreutils: 0 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
PASS fastbase64: 1 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
PASS coreutils: 1 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
PASS fastbase64: 2 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
PASS coreutils: 2 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
PASS fastbase64: 3 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
PASS coreutils: 3 valid payloads in 197608 bytes (spanning 4 chunks) decoded correctly
=== Error conditions ===
PASS fastbase64: non-existent input file rejected
PASS fastbase64: bad -w -1 rejected
PASS fastbase64: bad -w abc rejected
PASS fastbase64: bad -w 999999999999999 rejected
PASS coreutils: non-existent input file rejected
PASS coreutils: bad -w -1 rejected
PASS coreutils: bad -w abc rejected
PASS coreutils: bad -w 999999999999999 rejected
============================================================
Results: 155 passed, 13 failed
|
@pauldreik The problem was with the python script which I revised. |
This makes fastbase64 somewhat equivalent to standard base64. Unfortunately, the GNU base64 and the non-GNU base64 tools differ. It tried to come up with some kind of hybrid that can emulate the two.
Apple M4 Benchmark Results (times in milliseconds):
x64 Linux Benchmark Results (times in milliseconds):
So for inputs that are 10 MB or larger, it pays to use fastbase64. For small inputs, the time is dominated by the process overhead.
The Apple base64 decoder is quite slow, so that's one use case where we may want to use fastbase64.
Fixes #907
Fixes #908