Implement FromBase64 for &[u8]. #15810

SimonSapin · 2014-07-19T15:51:30Z

The algorithm was already based on bytes internally.

Also use byte literals instead of casting u8 to char for matching.

[breaking-change] The InvalidBase64Character variant of the FromBase64Error enum was renamed to InvalidBase64Byte, and contains a u8 instead of a char.

r? @alexcrichton

The algorithm was already based on bytes internally. Also use byte literals instead of casting u8 to char for matching.

sfackler · 2014-07-19T17:11:08Z

This impl used to exist and was removed when that module was overhauled. (Not that that means that it couldn't be added back.)

alexcrichton · 2014-07-20T03:02:59Z

Yes this was an explicit decision because base64 encodes to a string and decodes to a stream of bytes, and the two halves provide a clear separation of what's on each side of each boundary.

Is the impetus for this that the utf8 validation is too expensive to perform ahead of time?

SimonSapin · 2014-07-20T09:11:33Z

TL;DR: I think &[u8]::from_base64() still makes sense, and yes it is to avoid an unnecessary UTF-8 check.

I came across this in Servo when "fetching" URLs in the data: scheme. The result of percent-decoding (which applies to all schemes) is bytes. These URLs can be big (I’ve seen them used for multi-megabytes JPEG images). Although I haven’t measured it, UTF-8 validation is definitely wasteful. It’s unnecessary: the base64 decoder rejects non-alphabet bytes/characters anyway, and both variants of the base64 alphabet are subsets of ASCII.

Many network protocol and file formats designed before Unicode was considered universal use "bytes in an undetermined ASCII-compatible encoding" to represent text. This means that non-ASCII bytes are opaque (until the encoding is determined), but they’re not necessarily invalid and should be preserved.

While I wish UTF-8 was everywhere and I advocate cleanly separating bytes and Unicode to avoid Mojibake, I don’t think we should be dogmatic and say that anything text-like must be Unicode. The new b"foo" byte literals are based on the same idea.

I suppose we could have type AsciiCompatible<'a> = &'a [u8] to express this intent.

SimonSapin · 2014-07-20T09:30:31Z

I realize that "streaming" base64 might also be interested. Does it sound like something you’d want in libserialize? Should it be based on Iterator, Reader, Writer, or something else?

sfackler · 2014-07-20T16:36:26Z

I have prototypes of streaming implementations of base64 and hex that I was working on a while ago. I didn't push them upstream because they ended up with a ~2x perf hit compared to the current implementation, but that should be improve-able. I'll update them and open a PR.

SimonSapin · 2014-07-20T17:25:43Z

@sfackler What kind of stream do you use?

sfackler · 2014-07-20T18:21:25Z

It used Reader/Writer adapter types: Base64Writer encodes to base 64 and Base64Reader decodes base64.

SimonSapin · 2014-07-20T18:30:13Z

@sfackler, why these two among the four possible combinations of writers/readers and encoding/decoding?

sfackler · 2014-07-20T18:32:33Z

Those seemed like the more common cases, but there's no reason that all 4 shouldn't be implemented.

sfackler · 2014-07-20T18:35:55Z

For reference, Apache Commons-Codec provides Base64OutputStream which can both encode and decode, though it defaults to encoding, and the reverse for Base64InputStream: http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64OutputStream.html#Base64OutputStream(java.io.OutputStream, boolean)

@alexcrichton

The algorithm was already based on bytes internally. Also use byte literals instead of casting u8 to char for matching. [breaking-change] The `InvalidBase64Character` variant of the `FromBase64Error` enum was renamed to `InvalidBase64Byte`, and contains a `u8` instead of a `char`. r? @alexcrichton

…eykril fix: assists panic when trying to edit usage inside macro When we try to make a syntax node mutable inside a macro to edit it, it seems like the edits aren't properly reflected and will cause a panic when trying to make another syntax node mutable. This PR changes `bool_to_enum` and `promote_local_to_const` to use the original syntax range instead to edit the original file instead of the macro file. I'm not sure how to do it for `inline_call` with the example I mentioned in the issue, so I've left it out for now. Fixes rust-lang#15807

Implement FromBase64 for &[u8].

56218f5

The algorithm was already based on bytes internally. Also use byte literals instead of casting u8 to char for matching.

SimonSapin mentioned this pull request Jul 19, 2014

Remove FromBase64 workaround when it’s no longer needed. servo/servo#2877

Closed

alexcrichton closed this Jul 20, 2014

alexcrichton reopened this Jul 20, 2014

bors closed this Jul 21, 2014

bors merged commit 56218f5 into rust-lang:master Jul 21, 2014

SimonSapin deleted the base64-from-bytes branch July 30, 2014 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement FromBase64 for &[u8]. #15810

Implement FromBase64 for &[u8]. #15810

Uh oh!

SimonSapin commented Jul 19, 2014

Uh oh!

sfackler commented Jul 19, 2014

Uh oh!

alexcrichton commented Jul 20, 2014

Uh oh!

SimonSapin commented Jul 20, 2014

Uh oh!

SimonSapin commented Jul 20, 2014

Uh oh!

sfackler commented Jul 20, 2014

Uh oh!

SimonSapin commented Jul 20, 2014

Uh oh!

sfackler commented Jul 20, 2014

Uh oh!

SimonSapin commented Jul 20, 2014

Uh oh!

sfackler commented Jul 20, 2014

Uh oh!

sfackler commented Jul 20, 2014

Uh oh!

Uh oh!

Implement FromBase64 for &[u8]. #15810

Implement FromBase64 for &[u8]. #15810

Uh oh!

Conversation

SimonSapin commented Jul 19, 2014

Uh oh!

sfackler commented Jul 19, 2014

Uh oh!

alexcrichton commented Jul 20, 2014

Uh oh!

SimonSapin commented Jul 20, 2014

Uh oh!

SimonSapin commented Jul 20, 2014

Uh oh!

sfackler commented Jul 20, 2014

Uh oh!

SimonSapin commented Jul 20, 2014

Uh oh!

sfackler commented Jul 20, 2014

Uh oh!

SimonSapin commented Jul 20, 2014

Uh oh!

sfackler commented Jul 20, 2014

Uh oh!

sfackler commented Jul 20, 2014

Uh oh!

Uh oh!