Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix isize optimization in StableHasher for big-endian architectures #93615

Merged
merged 1 commit into from
Feb 5, 2022

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Feb 3, 2022

This PR fixes a problem with the stable hash optimization introduced in #93432. As @michaelwoerister has found out, the original implementation wouldn't produce the same hash on little/big architectures.

r? @the8472

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 3, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 3, 2022
//
// To ensure that this optimization hashes the exact same bytes on both little-endian and
// big-endian architectures, we compare the value with 0xFF before we convert the number
// into a unified representation (little-endian).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is correct but I think we could put more emphasis that the endianness conversion must be the last step because that creates platform-dependent values to get platform-independent bytes.

It would be clearer if siphasher::write were generic over [u8; N] instead of taking different primitives. Oh well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well to be fair it contains an optimized implementation for these primitives, so it's probably worth it.
Should I add something like

First, we have to compare the value (which has to be done in a platform-dependent manner) and only then can we convert the number to the little-endian format (to ensure platform-independent bytes being hashed).

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(which has to be done in a platform-dependent manner)

That's probably confusing. We're going from [platform-dependent byte-representation, platform-independent value] to [platform-independent byte-representation, platform-dependent value]. Which means all operations that depend on the value must happen before that and afterwards we could only do bit-twiddling operations.
It would be more obvious if we used to_le_bytes.

I don't mean to explain endianness, it's just about which things must be be done before and after the conversion. That's what I didn't consider during the review. 😓

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find .to_le() and .to_be() to be really confusing and always use to_le_bytes() and to_be_bytes() instead, which makes it much less likely to get things accidentally wrong (by converting twice for example).

Now that we have const generics it would probably be easy to just change SipHasher128::short_write() to SipHasher128::short_write<const LEN: usize>(&mut self, bytes: &[u8; LEN]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good plan. Since this is a portability bug let's fix it first and then improve the design.

@the8472
Copy link
Member

the8472 commented Feb 4, 2022

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Feb 4, 2022

📌 Commit c21b8e1 has been approved by the8472

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 4, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 5, 2022
…askrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#90132 (Stabilize `-Z instrument-coverage` as `-C instrument-coverage`)
 - rust-lang#91589 (impl `Arc::unwrap_or_clone`)
 - rust-lang#93495 (kmc-solid: Fix off-by-one error in `SystemTime::now`)
 - rust-lang#93576 (Emit more valid HTML from rustdoc)
 - rust-lang#93608 (Clean up `find_library_crate`)
 - rust-lang#93612 (doc: use U+2212 for minus sign in integer MIN/MAX text)
 - rust-lang#93615 (Fix `isize` optimization in `StableHasher` for big-endian architectures)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 2d62bd0 into rust-lang:master Feb 5, 2022
@rustbot rustbot added this to the 1.60.0 milestone Feb 5, 2022
@Kobzol Kobzol deleted the stable-hash-opt-endianness branch February 5, 2022 08:29
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 12, 2022
Use const generics in SipHasher128's short_write

This was proposed by `@michaelwoerister` [here](rust-lang#93615 (comment)).
A few comments:
1) I tried to pass `&[u8; LEN]` instead of `[u8; LEN]`. Locally, it resulted in small icount regressions (about 0.5 %). When passing by value, there were no regressions (and no improvements).
2) I wonder if we should use `to_ne_bytes()` in `SipHasher128` to keep it generic and only use `to_le_bytes()` in `StableHasher`. However, currently `SipHasher128` is only used in `StableHasher` and the `short_write` method was private, so I couldn't use it directly from `StableHasher`. Using `to_le()` in the `StableHasher` was breaking this abstraction boundary before slightly.

```rust
debug_assert!(LEN <= 8);
```
This could be done at compile time, but actually I think that now we can remove this assert altogether.

r? `@the8472`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants