Skip to content

Optimize u64_from_be_bytes() #11448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 10, 2014
Merged

Optimize u64_from_be_bytes() #11448

merged 2 commits into from
Jan 10, 2014

Conversation

c-a
Copy link
Contributor

@c-a c-a commented Jan 10, 2014

Instead of reading a byte at a time in a loop we hardcode how to read each size.
We also try to do as few reads as possible by reading as big primitive types as
possible. For example if size is eight we do a single read of a u64 value and
if size is seven we read it as [u32|u16|u8].

Timings on a Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz

-- Before --
running 7 tests
test io::extensions::test::test_u64_from_be_bytes ... ok
test io::extensions::bench::u64_from_be_bytes_4_aligned ... bench: 386 ns/iter (+/- 5)
test io::extensions::bench::u64_from_be_bytes_4_unaligned ... bench: 387 ns/iter (+/- 2)
test io::extensions::bench::u64_from_be_bytes_7_aligned ... bench: 628 ns/iter (+/- 1)
test io::extensions::bench::u64_from_be_bytes_7_unaligned ... bench: 637 ns/iter (+/- 3)
test io::extensions::bench::u64_from_be_bytes_8_aligned ... bench: 727 ns/iter (+/- 18)
test io::extensions::bench::u64_from_be_bytes_8_unaligned ... bench: 723 ns/iter (+/- 22)

callgrind rustc -S rust/src/test/bench/sudoku.rs
u64_from_be_bytes self: 4.37%

-- After --
running 7 tests
test io::extensions::test::test_u64_from_be_bytes ... ok
test io::extensions::bench::u64_from_be_bytes_4_aligned ... bench: 162 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_4_unaligned ... bench: 164 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_7_aligned ... bench: 201 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_7_unaligned ... bench: 210 ns/iter (+/- 9)
test io::extensions::bench::u64_from_be_bytes_8_aligned ... bench: 163 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_8_unaligned ... bench: 163 ns/iter (+/- 10)

callgrind rustc -S rust/src/test/bench/sudoku.rs
u64_from_be_bytes self: 1.78%

}

match size {
1u => unsafe { *offset(data.as_ptr(), start as int) as u64 },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd wrap the whole match in an unsafe block to avoid having to repeat this a lot. Also, Rust convention is for 4 space indents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code may also be cleaner if you lift the initial offset(data.as_ptr(), start) to the top of the match

@alexcrichton
Copy link
Member

This is pretty awesome, thanks!

Instead of reading a byte at a time in a loop we copy the relevant bytes into
a temporary vector of size eight. We can then read the value from the temporary
vector using a single u64 read. LLVM seems to be able to optimize this
almost scarily good.
@c-a
Copy link
Contributor Author

c-a commented Jan 10, 2014

Timings of new "memcpy" version.

-- memcpy --
running 7 tests
test io::extensions::test::test_u64_from_be_bytes ... ok
test io::extensions::bench::u64_from_be_bytes_4_aligned ... bench: 165 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_4_unaligned ... bench: 168 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_7_aligned ... bench: 187 ns/iter (+/- 5)
test io::extensions::bench::u64_from_be_bytes_7_unaligned ... bench: 197 ns/iter (+/- 5)
test io::extensions::bench::u64_from_be_bytes_8_aligned ... bench: 162 ns/iter (+/- 9)
test io::extensions::bench::u64_from_be_bytes_8_unaligned ... bench: 163 ns/iter (+/- 10)

callgrind rustc -S rust/src/test/bench/sudoku.rs
u64_from_be_bytes self: 1.60%

I'ts possible that this version may be slower when size is not statically known at compile time though.

bors added a commit that referenced this pull request Jan 10, 2014
Instead of reading a byte at a time in a loop we hardcode how to read each size.
We also try to do as few reads as possible by reading as big primitive types as
possible. For example if size is eight we do a single read of a u64 value and
if size is seven we read it as [u32|u16|u8].

Timings on a Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz

-- Before --
running 7 tests
test io::extensions::test::test_u64_from_be_bytes ... ok
test io::extensions::bench::u64_from_be_bytes_4_aligned         ... bench:       386 ns/iter (+/- 5)
test io::extensions::bench::u64_from_be_bytes_4_unaligned       ... bench:       387 ns/iter (+/- 2)
test io::extensions::bench::u64_from_be_bytes_7_aligned         ... bench:       628 ns/iter (+/- 1)
test io::extensions::bench::u64_from_be_bytes_7_unaligned       ... bench:       637 ns/iter (+/- 3)
test io::extensions::bench::u64_from_be_bytes_8_aligned         ... bench:       727 ns/iter (+/- 18)
test io::extensions::bench::u64_from_be_bytes_8_unaligned       ... bench:       723 ns/iter (+/- 22)

callgrind rustc -S  rust/src/test/bench/sudoku.rs
    u64_from_be_bytes self: 4.37%

-- After --
running 7 tests
test io::extensions::test::test_u64_from_be_bytes ... ok
test io::extensions::bench::u64_from_be_bytes_4_aligned         ... bench:       162 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_4_unaligned       ... bench:       164 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_7_aligned         ... bench:       201 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_7_unaligned       ... bench:       210 ns/iter (+/- 9)
test io::extensions::bench::u64_from_be_bytes_8_aligned         ... bench:       163 ns/iter (+/- 7)
test io::extensions::bench::u64_from_be_bytes_8_unaligned       ... bench:       163 ns/iter (+/- 10)

callgrind rustc -S  rust/src/test/bench/sudoku.rs
    u64_from_be_bytes self: 1.78%
@bors bors closed this Jan 10, 2014
@bors bors merged commit 0b3311c into rust-lang:master Jan 10, 2014
flip1995 pushed a commit to flip1995/rust that referenced this pull request Sep 7, 2023
…r=blyxyas

DefaultUnionRepresentation: explain why we only warn about unions with at least 2 non-ZST fields

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants