-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Invalid input data handling ? #27
Comments
The compliance baseline for this library should be the relevant RFC 4648. I never liked Postel's law much ("be strict in what you send, and liberal in what you accept"). Being liberal just invites abuse. All your examples should throw an error when they violate the preconditions. |
I'll check the |
Thanks! Let's have a library that can at least strictly validate. If at any later moment we want to relax the validation, we could create a flag for it. (Personally, I think that any string that violates the base64 spec as given in the RFC, is not a valid base64 string, and should not be given the benefit of the doubt.) |
Maybe one comment: it's quite common for base64 input to contain whitespace (newlines and such), and it would be helpful to have a flag to tell the library to ignore it. This does not negate the fact that a bona-fide base64 string with such characters is invalid, but it would indicate that it would be useful to have a flag that tells the library to strip out those characters before processing. (See also #15). |
So, not much else than what you're saying after reviewing First step, report all violations. |
For now, I added validation for
|
Hm, I'd like to avoid changing the API if possible. Your suggested addition basically checks if there are no trailing characters in the decode buffer. I'm not sure I consider that something that we must intercept and note. My attitude would be "as long as we honour the EOF marker and output no further bytes, who cares about trailing bytes". As far as decoding is concerned, those bytes are neither here or there, and it seems pedantic to point out to the user that his buffer length calculation is "optimistic". What do you think? Maybe the right way is to be pedantic and technically correct. |
Well, if I quote what you said in an earlier comment:
I'd say that the library should throw an error even when there are trailing bytes remaining. For sure, it shall be the case for the non streaming API. Since there's no API to get the number of unread bytes, I'd say the library shall return an error in this case (it might be possible to compute this with this version but adding whitespace character stripping will remove that possibility). I did not understand this:
The only thing that will be pointed out to the user is that he's feeding too much (or not enough) data in the decoder. The API change might be considered an enhancement only. Backward compatibility is still here. If the user wants a stricter behavior then, he can integrate the new function in his code. |
The viewpoint changes depending on how you conceptualize the buffer. If you define it as being strictly a Base64-encoded string and nothing more, you get a different viewpoint than when you define it as a buffer that contains a properly-terminated Base64-encoded string starting at offset 0. In the first case trailing bytes are an error, while in the second case trailing bytes after a valid string are simply irrelevant (which is what I meant with "neither here nor there": they don't matter). I've been holding more or less the second view of the buffer, whereas you are closer to the first. I don't think my view contradicts what I said earlier about violating the preconditions, because the preconditions are defined for the Base64-encoded string itself and not the buffer it's embedded in. Of your four examples, I believe the first three should throw errors, whereas the fourth is probably OK (but no bytes should be output after seeing the EOF marker). Concretely, I've assumed that it's not an error to pass an oversized buffer (of "optimistic" length :), as long as the Base64 string starting at offset 0 is valid. Maybe a stricter check would be a good idea, though. I do still think that it's better to be strict than liberal in these cases. |
OK, I see what you meant now. |
I'm not sure how "invalid" input data shall be processed. Could this be clarified ?
A few samples using
base64_decode
, what's the expected output for those ?This could be extended to streaming API, but I thought I'd start with the simple block decode.
The text was updated successfully, but these errors were encountered: