Added hexadecimal encoding module #8287

sfackler · 2013-08-04T18:47:13Z

FromHex ignores whitespace and parses either upper or lower case hex
digits. ToHex outputs lower case hex digits with no whitespace. Unlike
ToBase64, ToHex doesn't allow you to configure the output format. I
don't feel that it's super useful in this case.

alexcrichton · 2013-08-04T19:08:28Z

src/libextra/hex.rs

+
+/// A trait for converting hexadecimal encoded values
+pub trait FromHex {
+    /// Converts the value of `self`, interpreted as base64 encoded data, into


base64 typo

alexcrichton · 2013-08-04T19:18:27Z

Overall I think this is fine to include, although we're starting to get a fair number of encodings into libextra. I think that go's organization is nice with an encoding super-package, so perhaps we want to start moving things into:

encoding/json
encoding/base64
encoding/hex

And then all future encodings can be thrown into there as well. Another thing which I think would be nice is to have a unifying trait for encoding/decoding things to/from types. I'm not quite sure how that would look, but it'd be a shame if we had different traits for all the encodings. One possible route would be for things like json/xml to implement the serialize::{Encoder, Decoder} and for things like base64/base32/hex to only convert between bytes and strings.

Regardless, I think that we should get another opinion on whether this should be included in libextra (I'm in favor), and if there's any sort of reorganization of encodings it should happen on the mailing list/issue list and not here.

sfackler · 2013-08-04T19:55:07Z

+1 on an encoding super-package. I think the root extra package needs to be reorganized in general.

I don't think there's a reasonable trait for base64, hex, etc to share. What would the signature be?

sfackler · 2013-08-04T20:18:37Z

I did notice some very weird performance numbers from hex and base64 benchmarks. The From{Base64,Hex} benchmark runs at about 2-3x the speed of the To{Base64,Hex} test even though the logic to convert from is significantly more complex and they should be touching the same amount of memory.

test base64::test::from_base64 ... bench: 1000 ns/iter (+/- 349) = 204 MB/s
test base64::test::to_base64 ... bench: 2390 ns/iter (+/- 1130) = 63 MB/s
...
test hex::tests::bench_from_hex ... bench: 884 ns/iter (+/- 220) = 341 MB/s
test hex::tests::bench_to_hex ... bench: 2453 ns/iter (+/- 919) = 61 MB/s

bluss · 2013-08-05T03:10:06Z

Regarding the speed of to_hex, you can do something like the following

// can't use bytes!() here
static HEXDIGITS: [u8, ..16] = [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102];

fn to_hex_C(&self) -> ~str {
    let mut v = vec::with_capacity(self.len() * 2);
    for &byte in self.iter() {
        v.push(HEXDIGITS[byte >> 4]);
        v.push(HEXDIGITS[byte & 0xf]);
    }
    unsafe {
        str::raw::from_bytes_owned(v)
    }
}

I tested some different options and this was 3x faster than the original using .push_char (which is complex). Maybe the unsafe call to str::raw::from_bytes_owned is overkill, but it can be used since you know you output valid UTF-8.

huonw · 2013-08-05T03:34:48Z

I don't have a compiler to check, but why can't one use bytes!?

sfackler · 2013-08-05T03:42:25Z

bytes! works fine, you just have to change the type to &'static [u8]. Pushing the change in a couple of minutes.

alexcrichton · 2013-08-05T04:57:25Z

I was thinking about this earlier today, and currently you provide a ToStr implementation for &str and &[u8]. In my opinion, these encodings are strictly bytes on one side, strings on the other. I think it would be best to not intermingle the two of them. If someone wants to hex-encode their valid str type, I think it's better to be clearer and add the as_bytes method (which is a no-op essentially).

sfackler · 2013-08-05T05:00:59Z

Sounds reasonable to me. I only had a ToHex implementation for &str since ToBase64 had one. I'll yank them both. Do you think the From{Hex,Base64} implementation for &[u8] should go too? I'm leaning towards yes.

alexcrichton · 2013-08-05T05:03:24Z

@sfackler: yes

omasanori · 2013-08-05T09:36:48Z

@alexcrichton I'm prefer to distinguish serializers (JSON, ASN.1, etc.) from codecs (Base64, quoted-printable, etc.)
The names serialize::{Encoder, Decoder} is not so good IMHO. (Serialize(r) and Deserialize(r) better?)

alexcrichton · 2013-08-05T16:51:40Z

@omasanori, that's a good point! Let's move discussion to #8310

@sfackler, this looks good to me, I would be willing to r+ it, but again someone else should probably chime in about whether we should start shoving lots of encoding formats into libextra

FromHex ignores whitespace and parses either upper or lower case hex digits. ToHex outputs lower case hex digits with no whitespace. Unlike ToBase64, ToHex doesn't allow you to configure the output format. I don't feel that it's super useful in this case.

The overhead of str::push_char is high enough to cripple the performance of these two functions. I've switched them to build the output in a ~[u8] and then convert to a string later. Since we know exactly the bytes going into the vector, we can use the unsafe version to avoid the is_utf8 check. I could have riced it further with vec::raw::get, but it only added ~10MB/s so I didn't think it was worth it. ToHex is still ~30% slower than FromHex, which is puzzling. Before: ``` test base64::test::from_base64 ... bench: 1000 ns/iter (+/- 349) = 204 MB/s test base64::test::to_base64 ... bench: 2390 ns/iter (+/- 1130) = 63 MB/s ... test hex::tests::bench_from_hex ... bench: 884 ns/iter (+/- 220) = 341 MB/s test hex::tests::bench_to_hex ... bench: 2453 ns/iter (+/- 919) = 61 MB/s ``` After: ``` test base64::test::from_base64 ... bench: 1271 ns/iter (+/- 600) = 160 MB/s test base64::test::to_base64 ... bench: 759 ns/iter (+/- 286) = 198 MB/s ... test hex::tests::bench_from_hex ... bench: 875 ns/iter (+/- 377) = 345 MB/s test hex::tests::bench_to_hex ... bench: 593 ns/iter (+/- 240) = 254 MB/s ```

String NULL terminators are going away soon, so we may as well get rid of this now so it doesn't rot.

Encoding should really only be done from [u8]<->str. The extra convenience implementations don't really have a place, especially since they're so trivial. Also improved error messages in FromBase64.

sfackler · 2013-08-07T01:47:16Z

@brson retry?

FromHex ignores whitespace and parses either upper or lower case hex digits. ToHex outputs lower case hex digits with no whitespace. Unlike ToBase64, ToHex doesn't allow you to configure the output format. I don't feel that it's super useful in this case.

…5, r=Manishearth Erase late bound regions in `iter_not_returning_iterator` fixes rust-lang#8285 changelog: None

alexcrichton reviewed Aug 4, 2013
View reviewed changes

alexcrichton mentioned this pull request Aug 5, 2013

Reorganize the various encodings in libextra #8310

Closed

sfackler added 6 commits August 6, 2013 09:58

Some minor hex changes

463e241

Removing space for NULL terminator

858e166

String NULL terminators are going away soon, so we may as well get rid of this now so it doesn't rot.

Removed convenience encoding trait impls

e617651

Encoding should really only be done from [u8]<->str. The extra convenience implementations don't really have a place, especially since they're so trivial. Also improved error messages in FromBase64.

Result::get -> Result::unwrap

3b441c4

bors closed this Aug 7, 2013

flip1995 pushed a commit to flip1995/rust that referenced this pull request Jan 17, 2022

Auto merge of rust-lang#8287 - Jarcho:iter_not_returning_iterator_828…

496f26c

…5, r=Manishearth Erase late bound regions in `iter_not_returning_iterator` fixes rust-lang#8285 changelog: None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added hexadecimal encoding module #8287

Added hexadecimal encoding module #8287

Uh oh!

sfackler commented Aug 4, 2013

Uh oh!

alexcrichton Aug 4, 2013

Uh oh!

alexcrichton commented Aug 4, 2013

Uh oh!

sfackler commented Aug 4, 2013

Uh oh!

sfackler commented Aug 4, 2013

Uh oh!

bluss commented Aug 5, 2013

Uh oh!

huonw commented Aug 5, 2013

Uh oh!

sfackler commented Aug 5, 2013

Uh oh!

alexcrichton commented Aug 5, 2013

Uh oh!

sfackler commented Aug 5, 2013

Uh oh!

alexcrichton commented Aug 5, 2013

Uh oh!

omasanori commented Aug 5, 2013

Uh oh!

alexcrichton commented Aug 5, 2013

Uh oh!

sfackler commented Aug 7, 2013

Uh oh!

Uh oh!

Added hexadecimal encoding module #8287

Added hexadecimal encoding module #8287

Uh oh!

Conversation

sfackler commented Aug 4, 2013

Uh oh!

alexcrichton Aug 4, 2013

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Aug 4, 2013

Uh oh!

sfackler commented Aug 4, 2013

Uh oh!

sfackler commented Aug 4, 2013

Uh oh!

bluss commented Aug 5, 2013

Uh oh!

huonw commented Aug 5, 2013

Uh oh!

sfackler commented Aug 5, 2013

Uh oh!

alexcrichton commented Aug 5, 2013

Uh oh!

sfackler commented Aug 5, 2013

Uh oh!

alexcrichton commented Aug 5, 2013

Uh oh!

omasanori commented Aug 5, 2013

Uh oh!

alexcrichton commented Aug 5, 2013

Uh oh!

sfackler commented Aug 7, 2013

Uh oh!

Uh oh!