Add more `read_*` methods for big/little endian integers #33

dcsommer · 2019-12-23T19:50:19Z

A straightforward feature that would be quite useful for implementing network protocols would be additional methods on Reader for reading big and little endian integers. The API could be something like:

pub fn read_be<T>(&mut self) -> Result<T, EndOfInput>
pub fn read_le<T>(&mut self) -> Result<T, EndOfInput>

Where T is some integer type. This is similar to the C++ library Cursor, from Folly and fits nicely at the level of pulling bytes safely from the buffer.

The text was updated successfully, but these errors were encountered:

oherrala · 2020-04-30T11:18:40Z

@dcsommer Borrowing this issue to advertise my crate which is extension to untrusted:

https://crates.io/crates/untrustended

It contains a trait to extend untrusted's Reader with many new methods. I'm using it for making protocol parsers and I'd like to get feedback also from others.

There's also PR #14 to implement more methods directly into untrusted, but it's been slowly cooking do to lack of time.

dcsommer · 2020-04-30T15:37:13Z

@oherrala thanks for the crate link. I noticed untrustended reads byte by byte, whereas the PR reads all the needed bytes at once, so presumably the performance would be slightly better using the PR than the untrustended crate?

oherrala · 2020-05-03T08:54:27Z

@dcsommer Yeah, you are right. I think I benchmarked the PR to be faster than untrustended's way. There's no reason not to do the same thing with untrustended and if there's need to squeeze more speed out then I think it's easy to implement.

briansmith · 2021-01-26T06:32:47Z

I think I benchmarked the PR to be faster than untrustended's way. There's no reason not to do the same thing with untrustended and if there's need to squeeze more speed out then I think it's easy to implement.

I would hope that reading byte-by-byte wouldn't matter a huge amount. However, I imagine there is some performance impact and I think we should address that. Maybe we should have something like: fn read_bytes<T>() -> Result<&T, EndOfInput> where T could be an array of bytes [u8; 1], [u8; 2], etc. Then we could write:

let x = u64::from_be_bytes(*reader.read_bytes()?);

Where that would turn into basically a bounds check followed by a MOVBE instruction.

I think the above is not too terrible. I admit it isn't as nice as:

let x: u32 = reader.read_be()?;

Although it's true that there are a lot of data formats that are just big- or little- endian encodings of bytes, I almost never write parsers for those formats. Usually some kind of unusual variable-length encoding is used in the formats I'm interested in. For example, ASN.1, QUIC, UTF-8, HPACK. Definitely we wouldn't put the decoding of all those formats into untrusted, so it seems a little wrong to put the simple endian decoding into untrusted. However, maybe the simple endian encoding is just so common that lots of projects would immediately be able to delete a bunch of code if we added it to untrusted?

Further, what I've noticed in ring regarding simple endian-encoded values is that I usually want to read the next N bytes and get an &[u8; N] and then reinterpret those bytes as an &[[u8; 4]; N/4] and then copy those values into an (appropriately-aligned) array of BigEndian<u32>. Sometimes N is a constant so these are really Rust arrays (e.g. in almost every crypto part of ring except RSA), and other times N isn't a constant (e.g. RSA in ring) so the result would be a slice &[[u8; 4]]. Very rarely am I reading/writing single endian-encoded u32/u64/etc. values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more `read_*` methods for big/little endian integers #33

Add more `read_*` methods for big/little endian integers #33

dcsommer commented Dec 23, 2019

oherrala commented Apr 30, 2020

dcsommer commented Apr 30, 2020

oherrala commented May 3, 2020

briansmith commented Jan 26, 2021

Add more read_* methods for big/little endian integers #33

Add more read_* methods for big/little endian integers #33

Comments

dcsommer commented Dec 23, 2019

oherrala commented Apr 30, 2020

dcsommer commented Apr 30, 2020

oherrala commented May 3, 2020

briansmith commented Jan 26, 2021

Add more `read_*` methods for big/little endian integers #33

Add more `read_*` methods for big/little endian integers #33