Decoding data containing <NUL> values #114

bruchar1 · 2022-11-02T12:15:57Z

It seems that if I encode then decode data containing <NUL> characters, the data is correctly encoded and decoded, but base64_decode returns 0 instead of 1. Is that the expected behavior?

The text was updated successfully, but these errors were encountered:

aklomp · 2022-11-02T15:09:51Z

The return value of base64_decode should be independent of the data being processed (obviously). If the data is decoded successfully, then base64_decode should always return 1. If what you state is correct, then this is a pretty big bug.

~~What susprises me is that roundtrip encoding and decoding of 0x00 is part of the test suite.~~ Edit: that part of the test suite does not include the null terminator in its test.

I also cannot reproduce this bug with the following command line:

echo -ne 'hello\0world' | bin/base64 | bin/base64 -d >/dev/null; echo $?

This prints 0 (success) as it should.

So either this is an obscure bug hiding in one of the optimized SIMD decoders, or something else is going on.

Which codec are you using? Does the test suite run correctly on your machine?

Can you post any code to reproduce the issue?

aklomp · 2022-11-02T15:42:06Z

I added a very quick and dirty test to do roundtrip encoding on data with \0x00 in it. All tests in CI pass.

bruchar1 · 2022-11-02T16:13:53Z

I figured it out. My allocated buffer for encoding was not big enough, and it caused the last padding = to be missing.

I guess the readme is missleading:

The buffer in out has been allocated by the caller and is at least 4/3 the size of the input.

The right buffer size is probably (n + 2) / 3 * 4 in integer arithmetic.

BurningEnlightenment · 2022-11-03T06:47:38Z

TBH I think it would make sense to add two macros or inline functions to do buffer size calculations. As an added bonus it'd be way more readable than to repeat the size calculation everywhere.

it caused the last padding = to be missing.

While we are at it, I think a flag to encode/decode without padding would make sense, too. See also RFC4648 section 3.2.

aklomp · 2022-11-03T18:06:04Z

Agreed that it makes sense for the library to expose a macro to do the buffer size calculation.

I've been hesitant to add one (or to publish the "ultimate buffer sizing formula" in the sense of the post above) because I'm not positive that all encoders will always stay within the bounds of such a buffer. The SIMD encoders write 16 bytes of output at a time.

Then again, those SIMD encoders always operate on a full, guaranteed 12 bytes of input, so their output should always fit into the output buffer if it's sized according to something like the formula above.

Regarding the suggestion of adding a flag to disable padding: the suggestion makes sense, but I'm wary of adding more behavioral flags at this point. I'd first like to do a comprehensive rewrite to sort out various pain points, and rethink the architecture of this library first.

aklomp · 2022-11-03T20:30:16Z

I created #116 to discuss the addition of size macros.

aklomp self-assigned this Nov 2, 2022

aklomp added the bug label Nov 2, 2022

bruchar1 closed this as completed Nov 2, 2022

aklomp mentioned this issue Nov 3, 2022

Add a macro to calculate encoded size from raw size and vice versa #116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoding data containing <NUL> values #114

Decoding data containing <NUL> values #114

bruchar1 commented Nov 2, 2022 •

edited

Loading

aklomp commented Nov 2, 2022 •

edited

Loading

aklomp commented Nov 2, 2022

bruchar1 commented Nov 2, 2022

BurningEnlightenment commented Nov 3, 2022

aklomp commented Nov 3, 2022 •

edited

Loading

aklomp commented Nov 3, 2022

Decoding data containing <NUL> values #114

Decoding data containing <NUL> values #114

Comments

bruchar1 commented Nov 2, 2022 • edited Loading

aklomp commented Nov 2, 2022 • edited Loading

aklomp commented Nov 2, 2022

bruchar1 commented Nov 2, 2022

BurningEnlightenment commented Nov 3, 2022

aklomp commented Nov 3, 2022 • edited Loading

aklomp commented Nov 3, 2022

bruchar1 commented Nov 2, 2022 •

edited

Loading

aklomp commented Nov 2, 2022 •

edited

Loading

aklomp commented Nov 3, 2022 •

edited

Loading