Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more read_* methods for big/little endian integers #33

Open
dcsommer opened this issue Dec 23, 2019 · 4 comments
Open

Add more read_* methods for big/little endian integers #33

dcsommer opened this issue Dec 23, 2019 · 4 comments

Comments

@dcsommer
Copy link

A straightforward feature that would be quite useful for implementing network protocols would be additional methods on Reader for reading big and little endian integers. The API could be something like:

pub fn read_be<T>(&mut self) -> Result<T, EndOfInput>
pub fn read_le<T>(&mut self) -> Result<T, EndOfInput>

Where T is some integer type. This is similar to the C++ library Cursor, from Folly and fits nicely at the level of pulling bytes safely from the buffer.

@oherrala
Copy link
Contributor

@dcsommer Borrowing this issue to advertise my crate which is extension to untrusted:

https://crates.io/crates/untrustended

It contains a trait to extend untrusted's Reader with many new methods. I'm using it for making protocol parsers and I'd like to get feedback also from others.

There's also PR #14 to implement more methods directly into untrusted, but it's been slowly cooking do to lack of time.

@dcsommer
Copy link
Author

@oherrala thanks for the crate link. I noticed untrustended reads byte by byte, whereas the PR reads all the needed bytes at once, so presumably the performance would be slightly better using the PR than the untrustended crate?

@oherrala
Copy link
Contributor

oherrala commented May 3, 2020

@dcsommer Yeah, you are right. I think I benchmarked the PR to be faster than untrustended's way. There's no reason not to do the same thing with untrustended and if there's need to squeeze more speed out then I think it's easy to implement.

@briansmith
Copy link
Owner

I think I benchmarked the PR to be faster than untrustended's way. There's no reason not to do the same thing with untrustended and if there's need to squeeze more speed out then I think it's easy to implement.

I would hope that reading byte-by-byte wouldn't matter a huge amount. However, I imagine there is some performance impact and I think we should address that. Maybe we should have something like: fn read_bytes<T>() -> Result<&T, EndOfInput> where T could be an array of bytes [u8; 1], [u8; 2], etc. Then we could write:

let x = u64::from_be_bytes(*reader.read_bytes()?);

Where that would turn into basically a bounds check followed by a MOVBE instruction.

I think the above is not too terrible. I admit it isn't as nice as:

let x: u32 = reader.read_be()?;

Although it's true that there are a lot of data formats that are just big- or little- endian encodings of bytes, I almost never write parsers for those formats. Usually some kind of unusual variable-length encoding is used in the formats I'm interested in. For example, ASN.1, QUIC, UTF-8, HPACK. Definitely we wouldn't put the decoding of all those formats into untrusted, so it seems a little wrong to put the simple endian decoding into untrusted. However, maybe the simple endian encoding is just so common that lots of projects would immediately be able to delete a bunch of code if we added it to untrusted?

Further, what I've noticed in ring regarding simple endian-encoded values is that I usually want to read the next N bytes and get an &[u8; N] and then reinterpret those bytes as an &[[u8; 4]; N/4] and then copy those values into an (appropriately-aligned) array of BigEndian<u32>. Sometimes N is a constant so these are really Rust arrays (e.g. in almost every crypto part of ring except RSA), and other times N isn't a constant (e.g. RSA in ring) so the result would be a slice &[[u8; 4]]. Very rarely am I reading/writing single endian-encoded u32/u64/etc. values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants