Fix `isize` optimization in `StableHasher` for big-endian architectures #93615

Kobzol · 2022-02-03T10:49:50Z

This PR fixes a problem with the stable hash optimization introduced in #93432. As @michaelwoerister has found out, the original implementation wouldn't produce the same hash on little/big architectures.

r? @the8472

the8472 · 2022-02-03T17:18:32Z

compiler/rustc_data_structures/src/stable_hasher.rs

+        //
+        // To ensure that this optimization hashes the exact same bytes on both little-endian and
+        // big-endian architectures, we compare the value with 0xFF before we convert the number
+        // into a unified representation (little-endian).


The comment is correct but I think we could put more emphasis that the endianness conversion must be the last step because that creates platform-dependent values to get platform-independent bytes.

It would be clearer if siphasher::write were generic over [u8; N] instead of taking different primitives. Oh well.

Well to be fair it contains an optimized implementation for these primitives, so it's probably worth it.
Should I add something like

First, we have to compare the value (which has to be done in a platform-dependent manner) and only then can we convert the number to the little-endian format (to ensure platform-independent bytes being hashed).

?

(which has to be done in a platform-dependent manner)

That's probably confusing. We're going from [platform-dependent byte-representation, platform-independent value] to [platform-independent byte-representation, platform-dependent value]. Which means all operations that depend on the value must happen before that and afterwards we could only do bit-twiddling operations.
It would be more obvious if we used to_le_bytes.

I don't mean to explain endianness, it's just about which things must be be done before and after the conversion. That's what I didn't consider during the review. 😓

I find .to_le() and .to_be() to be really confusing and always use to_le_bytes() and to_be_bytes() instead, which makes it much less likely to get things accidentally wrong (by converting twice for example).

Now that we have const generics it would probably be easy to just change SipHasher128::short_write() to SipHasher128::short_write<const LEN: usize>(&mut self, bytes: &[u8; LEN]).

That sounds like a good plan. Since this is a portability bug let's fix it first and then improve the design.

the8472 · 2022-02-04T14:07:11Z

@bors r+ rollup

bors · 2022-02-04T14:07:13Z

📌 Commit c21b8e1 has been approved by the8472

…askrgr Rollup of 7 pull requests Successful merges: - rust-lang#90132 (Stabilize `-Z instrument-coverage` as `-C instrument-coverage`) - rust-lang#91589 (impl `Arc::unwrap_or_clone`) - rust-lang#93495 (kmc-solid: Fix off-by-one error in `SystemTime::now`) - rust-lang#93576 (Emit more valid HTML from rustdoc) - rust-lang#93608 (Clean up `find_library_crate`) - rust-lang#93612 (doc: use U+2212 for minus sign in integer MIN/MAX text) - rust-lang#93615 (Fix `isize` optimization in `StableHasher` for big-endian architectures) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

Use const generics in SipHasher128's short_write This was proposed by `@michaelwoerister` [here](rust-lang#93615 (comment)). A few comments: 1) I tried to pass `&[u8; LEN]` instead of `[u8; LEN]`. Locally, it resulted in small icount regressions (about 0.5 %). When passing by value, there were no regressions (and no improvements). 2) I wonder if we should use `to_ne_bytes()` in `SipHasher128` to keep it generic and only use `to_le_bytes()` in `StableHasher`. However, currently `SipHasher128` is only used in `StableHasher` and the `short_write` method was private, so I couldn't use it directly from `StableHasher`. Using `to_le()` in the `StableHasher` was breaking this abstraction boundary before slightly. ```rust debug_assert!(LEN <= 8); ``` This could be done at compile time, but actually I think that now we can remove this assert altogether. r? `@the8472`

Fix isize optimization in StableHasher for big-endian architectures

c21b8e1

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 3, 2022

rust-highfive assigned the8472 Feb 3, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 3, 2022

the8472 reviewed Feb 3, 2022

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 4, 2022

matthiaskrgr mentioned this pull request Feb 4, 2022

Rollup of 7 pull requests #93655

Merged

bors merged commit 2d62bd0 into rust-lang:master Feb 5, 2022

rustbot added this to the 1.60.0 milestone Feb 5, 2022

Kobzol deleted the stable-hash-opt-endianness branch February 5, 2022 08:29

Kobzol mentioned this pull request Feb 5, 2022

Use const generics in SipHasher128's short_write #93671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `isize` optimization in `StableHasher` for big-endian architectures #93615

Fix `isize` optimization in `StableHasher` for big-endian architectures #93615

Uh oh!

Kobzol commented Feb 3, 2022

Uh oh!

the8472 Feb 3, 2022

Uh oh!

Kobzol Feb 3, 2022

Uh oh!

the8472 Feb 3, 2022

Uh oh!

michaelwoerister Feb 4, 2022

Uh oh!

the8472 Feb 4, 2022

Uh oh!

the8472 commented Feb 4, 2022

Uh oh!

bors commented Feb 4, 2022

Uh oh!

Uh oh!

Fix isize optimization in StableHasher for big-endian architectures #93615

Fix isize optimization in StableHasher for big-endian architectures #93615

Uh oh!

Conversation

Kobzol commented Feb 3, 2022

Uh oh!

the8472 Feb 3, 2022

Choose a reason for hiding this comment

Uh oh!

Kobzol Feb 3, 2022

Choose a reason for hiding this comment

Uh oh!

the8472 Feb 3, 2022

Choose a reason for hiding this comment

Uh oh!

michaelwoerister Feb 4, 2022

Choose a reason for hiding this comment

Uh oh!

the8472 Feb 4, 2022

Choose a reason for hiding this comment

Uh oh!

the8472 commented Feb 4, 2022

Uh oh!

bors commented Feb 4, 2022

Uh oh!

Uh oh!

Fix `isize` optimization in `StableHasher` for big-endian architectures #93615

Fix `isize` optimization in `StableHasher` for big-endian architectures #93615