-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Optimize u64_from_be_bytes() #11448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize u64_from_be_bytes() #11448
Conversation
} | ||
|
||
match size { | ||
1u => unsafe { *offset(data.as_ptr(), start as int) as u64 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd wrap the whole match
in an unsafe
block to avoid having to repeat this a lot. Also, Rust convention is for 4 space indents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code may also be cleaner if you lift the initial offset(data.as_ptr(), start)
to the top of the match
This is pretty awesome, thanks! |
Instead of reading a byte at a time in a loop we copy the relevant bytes into a temporary vector of size eight. We can then read the value from the temporary vector using a single u64 read. LLVM seems to be able to optimize this almost scarily good.
Timings of new "memcpy" version. -- memcpy -- callgrind rustc -S rust/src/test/bench/sudoku.rs I'ts possible that this version may be slower when size is not statically known at compile time though. |
Instead of reading a byte at a time in a loop we hardcode how to read each size. We also try to do as few reads as possible by reading as big primitive types as possible. For example if size is eight we do a single read of a u64 value and if size is seven we read it as [u32|u16|u8]. Timings on a Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz -- Before -- running 7 tests test io::extensions::test::test_u64_from_be_bytes ... ok test io::extensions::bench::u64_from_be_bytes_4_aligned ... bench: 386 ns/iter (+/- 5) test io::extensions::bench::u64_from_be_bytes_4_unaligned ... bench: 387 ns/iter (+/- 2) test io::extensions::bench::u64_from_be_bytes_7_aligned ... bench: 628 ns/iter (+/- 1) test io::extensions::bench::u64_from_be_bytes_7_unaligned ... bench: 637 ns/iter (+/- 3) test io::extensions::bench::u64_from_be_bytes_8_aligned ... bench: 727 ns/iter (+/- 18) test io::extensions::bench::u64_from_be_bytes_8_unaligned ... bench: 723 ns/iter (+/- 22) callgrind rustc -S rust/src/test/bench/sudoku.rs u64_from_be_bytes self: 4.37% -- After -- running 7 tests test io::extensions::test::test_u64_from_be_bytes ... ok test io::extensions::bench::u64_from_be_bytes_4_aligned ... bench: 162 ns/iter (+/- 7) test io::extensions::bench::u64_from_be_bytes_4_unaligned ... bench: 164 ns/iter (+/- 7) test io::extensions::bench::u64_from_be_bytes_7_aligned ... bench: 201 ns/iter (+/- 7) test io::extensions::bench::u64_from_be_bytes_7_unaligned ... bench: 210 ns/iter (+/- 9) test io::extensions::bench::u64_from_be_bytes_8_aligned ... bench: 163 ns/iter (+/- 7) test io::extensions::bench::u64_from_be_bytes_8_unaligned ... bench: 163 ns/iter (+/- 10) callgrind rustc -S rust/src/test/bench/sudoku.rs u64_from_be_bytes self: 1.78%
…r=blyxyas DefaultUnionRepresentation: explain why we only warn about unions with at least 2 non-ZST fields changelog: none
Instead of reading a byte at a time in a loop we hardcode how to read each size.
We also try to do as few reads as possible by reading as big primitive types as
possible. For example if size is eight we do a single read of a u64 value and
if size is seven we read it as [u32|u16|u8].
Timings on a Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz
-- Before --
running 7 tests
test io::extensions::test::test_u64_from_be_bytes ... ok
test io::extensions::bench::u64_from_be_bytes_4_aligned ... bench: 386 ns/iter (+/- 5)
test io::extensions::bench::u64_from_be_bytes_4_unaligned ... bench: 387 ns/iter (+/- 2)
test io::extensions::bench::u64_from_be_bytes_7_aligned ... bench: 628 ns/iter (+/- 1)
test io::extensions::bench::u64_from_be_bytes_7_unaligned ... bench: 637 ns/iter (+/- 3)
test io::extensions::bench::u64_from_be_bytes_8_aligned ... bench: 727 ns/iter (+/- 18)
test io::extensions::bench::u64_from_be_bytes_8_unaligned ... bench: 723 ns/iter (+/- 22)
callgrind rustc -S rust/src/test/bench/sudoku.rs
u64_from_be_bytes self: 4.37%
-- After --
running 7 tests
test io::extensions::test::test_u64_from_be_bytes ... ok
test io::extensions::bench::u64_from_be_bytes_4_aligned ... bench: 162 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_4_unaligned ... bench: 164 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_7_aligned ... bench: 201 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_7_unaligned ... bench: 210 ns/iter (+/- 9)
test io::extensions::bench::u64_from_be_bytes_8_aligned ... bench: 163 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_8_unaligned ... bench: 163 ns/iter (+/- 10)
callgrind rustc -S rust/src/test/bench/sudoku.rs
u64_from_be_bytes self: 1.78%