Hash of `Discriminant` now produces different results across CPU architectures #74215

korken89 · 2020-07-10T11:09:24Z

Hi, we just noticed that there has been changes to the inner of Discriminant, i.e. DiscriminantKind,
that causes a Hash of a discriminant to produce different results across CPU architectures.

E.g.

#[derive(Hash)]
enum SomeEnum {
    One,
    Two,
    Three,
}

When hashing this in the current stable one gets writes that look like this:

Hash for One:
[0, 0, 0, 0, 0, 0, 0, 0]

Hash for Two
[1, 0, 0, 0, 0, 0, 0, 0]

Hash for Tree
[2, 0, 0, 0, 0, 0, 0, 0]

And depending on the architecture in the current nightly one gets for armv7-unknown-linux-gnueabihf:

Hash for One:
[0, 0, 0, 0]

Hash for Two
[1, 0, 0, 0]

Hash for Tree
[2, 0, 0, 0,]

However for ex x86_64-unknown-linux-gnu it is still 8 bytes as before.

This is a problem as e.g. CRC hashes are no longer compatible between CPU architectures.
Not sure if this was intended, but it seems like a breaking change to me?

In our case we are sending structures between an embedded thumbv7em-none-eabihf and an
armv7-unknown-linux-gnueabihf system where CRCs are calculated based on #[derive(Hash)]
for said structures to be sent.

rustc --version --verbose:

rustc 1.46.0-nightly (5db778aff 2020-07-09)
binary: rustc
commit-hash: 5db778affee7c6600c8e7a177c48282dab3f6292
commit-date: 2020-07-09
host: x86_64-unknown-linux-gnu
release: 1.46.0-nightly
LLVM version: 10.0


rustc 1.44.1 (c7087fe00 2020-06-17)
binary: rustc
commit-hash: c7087fe00d2ba919df1d813c040a5d47e43b0fe7
commit-date: 2020-06-17
host: x86_64-unknown-linux-gnu
release: 1.44.1
LLVM version: 9.0

The text was updated successfully, but these errors were encountered:

kotauskas · 2020-07-10T11:35:36Z

I don't think that the Hash trait was supposed to guarantee stable results over different architectures or crate/compiler versions: the only valid way of using it is for keys in HashMap/HashSet. There are crates specifically designed for consistent hashing with a specific algorithm, but those are not part of std and operate on bytes, and gathering those bytes from a value in memory is a Serde+Bincode thing.

nbdd0121 · 2020-07-10T12:48:14Z

Discriminant has been changed from u64 to the actual integer type that represent the discriminant, to support #[repr(u128)] enum, so they hash differently.

Mark-Simulacrum · 2020-07-10T14:45:57Z

Tagging as a @rust-lang/libs regression -- I suspect that this is definitely outside the scope of "intentional coverage" of our stability policy, but we could probably change the Hash impl to always hash the ~same number of bytes. Regardless I don't think Rust guarantees the same value in the discriminant when compiling more than once (much less for different targets), so in that sense it's not really covered either.

Amanieu · 2020-07-10T14:48:24Z

Either way we should specifically mention the stability guarantees (or lack thereof) in the Hash documentation.

SimonSapin · 2020-07-10T15:52:18Z

However for ex x86_64-unknown-linux-gnu it is still 8 bytes as before.

I’m a bit confused because the issue description mentions two rustc versions. Do you get different results on different architectures with the same compiler version? Does std::mem::size_of::<SomeEnum>() also vary based on the architecture?

nbdd0121 · 2020-07-10T16:15:17Z

I’m a bit confused because the issue description mentions two rustc versions. Do you get different results on different architectures with the same compiler version? Does std::mem::size_of::<SomeEnum>() also vary based on the architecture?

@SimonSapin No, SomeEnum will always have same size as i8 because i8 is sufficient to fit. However Discriminant<SomeEnum> will be of size isize.

rust/src/librustc_middle/ty/mod.rs

Lines 2179 to 2181 in e59b08e

    
               pub fn discr_type(&self) -> attr::IntType { 
        
                   self.int.unwrap_or(attr::SignedInt(ast::IntTy::Isize)) 
        
               }

lcnr · 2020-07-15T13:11:29Z

A quick summary of the problem:
mem::Discriminant::hash now calls isize::hash instead of u64::hash on all architectures.

This breaks code which relies on Hasher::write getting called with the same value on both 64 and 32 bit systems.
This can be fixed by manually implementing write_isize and write_usize for a Hasher if it should be stable between architectures.

(I think we still don't guarantee that enums with unspecified repr have the same discriminant when compiling for a different target)

Not really part of T-compiler, so I am removing I-prioritize for now, cc @rust-lang/lang

SimonSapin · 2020-07-15T16:59:32Z

The root of the issue seems to be that <SomeEnum as std::marker::DiscriminantKind>::Discriminant is isize for an enum without a repr attribute.

size_of::<SomeEnum> already is that of the smallest integer type that can fit all discriminant values. Couldn’t DiscriminantKind::Discriminant be that type as well, rather than isize?

nbdd0121 · 2020-07-15T18:53:01Z

The root of the issue seems to be that <SomeEnum as std::marker::DiscriminantKind>::Discriminant is isize for an enum without a repr attribute.

size_of::<SomeEnum> already is that of the smallest integer type that can fit all discriminant values. Couldn’t DiscriminantKind::Discriminant be that type as well, rather than isize?

I believe size_of::<SomeEnum> being the smallest integer type is an optimisation. When we perform checks the discriminant is still treated as being represented by an isize (assigning a number non-representable by isize will cause an error).

enum Eg {
   A = 0x8000_0000, // "error: literal out of range for isize" on 32-bit platforms
}

If we make the discriminant same size as size_of::<SomeEnum>, adding a new discriminant may silently break things.

#[non_exhaustive]
enum SomeEnum {
   A = 0,
   B = 255,
   C, // Discriminant is `u8` without this, `u16` with this. So it will break hashing as well.
}

kotauskas · 2020-07-15T19:07:54Z

This breaks code which relies on Hasher::write getting called with the same value on both 64 and 32 bit systems

Still, I don't think that code should rely on that at all. We do need a mention of this consideration in the documentation of the Hash trait though.

SimonSapin · 2020-07-15T19:10:43Z

If we make the discriminant same size as size_of::<SomeEnum>, adding a new discriminant may silently break things.

That changing the source definition of a type without a repr attribute changes (breaks) some behavior seems less surprising to me than platform-dependent behavior without apparently doing anything platform-specific or directly involving platform-dependent types like isize and usize

scottmcm · 2020-07-15T23:26:28Z

I see lang was pinged here, but it's not clear to me that this is a lang issue. AFAIK Discriminant<> and Hash are libs's domain.

(Personally it seems like there's no guarantee that Hash does the same thing on different platforms, so I'd be inclined to solve this by just putting a note in the docs that Hashes across different builds can be different. I do like the point that it seems weird to a discriminant to be a [ui]size type instead of a [ui]N type, though.)

nbdd0121 · 2020-07-15T23:31:51Z

If we change the discriminant away from isize, then what integer type is going to used when typechecking the custom discriminant values? Changing this behaviour might be a breaking change.

https://doc.rust-lang.org/reference/items/enumerations.html#custom-discriminant-values-for-fieldless-enumerations:

Under the default representation, the specified discriminant is interpreted as an isize value although the compiler is allowed to use a smaller type in the actual memory layout. The size and thus acceptable values can be changed by using a primitive representation or the C representation.

SimonSapin · 2020-07-15T23:58:59Z

AFAIK Discriminant<> and Hash are libs's domain.

The "the DiscriminantKind trait is implemented for every type" part and the choice of associated type is magical as far as the standard library is concerned, though. (There is no impl block for it in source code, similar to Sized.) Arguably that’s in the lang team’s domain?

If we change the discriminant away from isize […] might be a breaking change.

We could change the DiscriminantKind::Discriminant associated type without changing how the part after = in the source definition of an enum is type-checked.

nbdd0121 · 2020-07-16T01:19:19Z

We could change the DiscriminantKind::Discriminant associated type without changing how the part after = in the source definition of an enum is type-checked.

What about enums with only 1 variant?

enum Enum {
  A = 256
}

std::mem::size_of::<Enum>() would be 0, but we don't have an integer type that can only represent 256.

Similar argument applies to enum with tuple/struct variants. For things like Option<T>, what would be the proper size of its discriminant? 1? But Option<u32> is 4 bytes larger than u32, while Option<&mut T> is the same size as &mut T.

Do we need to have another set of logic to compute the smallest integer type necessary to hold the discriminant, which would not be essentially the same as the size of the enum after layout optimisation? My stance is no, we'd better keep it simple and leave it as isize.

SimonSapin · 2020-07-16T08:04:09Z

We talk about "the discriminant" but that can refer to at least four different things that are not necessarily the same:

The part after = in the source definition of a variant, and its fallback when omitted (start at zero, increment by one). Expressions in that position are type-checked as the integer type used in #[repr(…)] for the enum if any, isize otherwise. (enum SomeEnum { Foo = 1_u32 } is a type error.)
The numerical value that you get when converting a field-less enum value to an integer type with the as operator. The "source" integer type here is less relevant since we’re converting anyway.
The return type of std::mem::discriminant() type is deliberately opaque. As of Rust 1.44 it doesn’t even provide an ordering. Some aspects like its size_of and Hash behavior are observable, though not guaranteed to be stable across compiler versions.
The memory representation of an enum may or may not involve a tag (using terminology from https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html). Option<&mut T> for example doesn’t have a tag.

When it exists, 2. is defined to be the same as 1. Other than that each of these four things may be more or less related to the others for ease of implementation but they don’t have to.

Enums without a repr attribute have number of memory layout optimization including not having a tag at all for single-variant enums, and making the tag smaller than isize in some cases even though 1. is type-checked as isize:

enum SomeEnum { A, B = 255 }
enum OtherEnum { A, B = 256 }
enum Single { B = 256 }
fn main() {
    dbg!(std::mem::size_of::<SomeEnum>());
    dbg!(std::mem::size_of::<OtherEnum>());
    dbg!(std::mem::size_of::<Single>());
}

(Playground)

[src/main.rs:5] std::mem::size_of::<SomeEnum>() = 1
[src/main.rs:6] std::mem::size_of::<OtherEnum>() = 2
[src/main.rs:7] std::mem::size_of::<Single>() = 0

These optimizations could go further and for example make OtherEnum a occupy a single byte by separating the values of 1. and 4., but at the cost of making as conversions require a lookup table. (It’s sort of what happens for Single, with a trivial one-entry "table".)

The above also shows that the compiler already has logic for finding the smallest integer type that fits a set of values (although it’s not the only thing that determines the memory representation). DiscriminantKind::Discriminant could reuse that logic.

nbdd0121 · 2020-07-16T14:54:56Z

We could do it, but I am still not convinced that we should do it. As you mentioned, 1. 2. and 4. are different things already, and changing DiscriminantKind::Discriminant away from isize would make 3. no longer same as 1. and 2., while 3. would still not align with 4., therefore making it even more confusing than the behaviour today.

After all, we provide no guarantee that the memory layout optimisations will happen (except for #[rustc_nonnull_optimization_guaranteed]) or the discriminant or the hash would be stable. By making it de facto stable might make people actually depend on that, restricting what we could do in the future.

SimonSapin · 2020-07-16T17:21:59Z

Trying to reduce de-facto stability of unspecified behavior does not seem to me like a good reason by itself to artificially introduce/preserve target-dependence.

Consistency between 1. and 3. for users is a more reasonable argument to me, although mem::Discriminant is opaque so that consistency is only observable indirectly through unspecified aspects like size_of and Hash.

(I wish the default type for 1. was a target-independent one like i32, or better yet "somehow" not having a single default type in the first place. But pre-1.0 was the time to change that.)

nikomatsakis · 2020-07-16T22:04:24Z

Hmm, so:

It seems clear that the Hash trait in general is not meant to produce portable hashes when hash_isize is used, so the question is really whether DiscriminantKind (and enums, more generally) are meant to have hashes that are portable across target architectures? I don't think we explicitly discussed that but it seems quite likely we would not have guaranteed it, if we had.

It also seems like a reasonable workaround exists. As @lcnr said, users can write a hasher that routes hash_isize to hash_i64 (and hash_usize, of course), and thus explicitly make their hashes independent from the architecture.

If we did however want to make hashing enums independent from the target architecture (at least, those that don't explicitly use isize and usize), I guess the proposal is that the DiscriminantKind for an enum would not be isize but rather the "smallest integer that works"? This may or may not be easy to do, implementation wise. I guess I don't have a strong opinion about it, but it makes me a bit nervous, since it introduces a new dependency where figuring out the DiscriminantKind for an enum now requires evaluating the discriminant values -- this is a new opportunity for a cycle. I guess I'd lean against it, myself, all things being equal. (@lcnr may have more detailed thoughts.)

SimonSapin · 2020-07-17T19:18:04Z

I guess the proposal is that the DiscriminantKind for an enum would not be isize but rather the "smallest integer that works"? This may or may not be easy to do, implementation wise.

This logic is already implemented in the compiler. When the default memory representation of an enum uses a tag, the tag has the representation of the smallest integer type that works.

nikomatsakis · 2020-07-17T20:37:29Z

@SimonSapin yes, that logic is implemented, but it is implemented as part of the Layout code, and I am concerned that now we would have to execute that code in order to figure out the DiscriminantKind::<SomeEnumType>, whereas now we don't. At least that's my guess, I've not dug into the code and I might be misremembering things. Perhaps there's already a dependency.

scottmcm · 2020-10-12T22:34:45Z

this is a new opportunity for a cycle

Is this worse in some way than the [u16; sizeof(Self)] cycles for structs?

The error on using sizeof(Self) in a variant initializer says "computing layout of Foo [...] requires const-evaluating + checking Foo::Bar::{{constant}}#0", which implies to me that this wouldn't actually be a new dependency.

That makes me curious about things like

pub enum Foo {
    Bar = 12345,
    Baz = 1 + Foo::Bar as isize,
}

But the MIR for Baz's initializer there already avoids making a mention of Foo, just summoning the value somehow

_3 = CheckedAdd(const 12345, const 0_isize);

So I guess that potential issue has already been avoided.

The "the DiscriminantKind trait is implemented for every type" part and the choice of associated type is magical as far as the standard library is concerned, though.

Hmm, I guess I wasn't thinking about the DiscriminantKind trait, which describes itself as "unlikely to ever be stabilized". The mem::Discriminant<_> type itself could, hypothetically, cast to i64 in its Hash implementation if it wanted, as a way to resolve this issue. (Which would even be fine for a u128 discriminant, because it's just Hash. Not that that would necessarily be good -- I'd certainly like enum { A, B, C } to only have to hash one byte.)

Right now I think the only stable way to get the value out is with as _____, which truncates/extends to the requested type, so lang might be able to avoid making a guarantee here. (Until someone wants to offer an API like Discriminant<T>: Into<T::Discriminant>, but that's not here.)

This issue is now old enough that it's a de-facto decision that the breakage isn't important enough to just do the i64::wrapping_from(discriminant) change. So I don't know what the correct scope of a change would be here, if any. Given that [T]: Hash appears to be platform-dependent, I do think it's not unreasonable for Discriminant: Hash to be platform-dependent as well -- as has already been said, a specific Hasher can choose to extend (or truncate) [ui]size as part of hashing if it wants.

Meta: It'd also be a shame to end up with folk wisdom of "you should #[repr(u8)] your enums because it makes the derive(Hash) implementation faster".

Mark-Simulacrum · 2020-11-06T15:58:23Z

Untagging as a regression, since this landed in 1.45.

m-ou-se · 2021-10-27T18:10:52Z

cc @rust-lang/libs-api

joshtriplett · 2021-11-10T19:23:50Z

We discussed this in today's @rust-lang/libs meeting, and we feel like the only remaining action here is to document this behavior.

m-ou-se · 2021-11-10T19:24:38Z

(Any proposals for providing more guarantees about this can be proposed and discussed separately.)

Dependence on endianness and type sizes was reported for enum discriminants in rust-lang#74215 but it is a more general issue since for example the default implementation of `Hasher::write_usize` uses native endianness. Additionally the implementations of library types are occasionally changed as their internal fields change or hashing gets optimized.

Document non-guarantees for Hash Dependence on endianness and type sizes was reported for enum discriminants in rust-lang#74215 but it is a more general issue since for example the default implementation of `Hasher::write_usize` uses native endianness. Additionally the implementations of library types are occasionally changed as their internal fields change or hashing gets optimized. ## Question Should this go on the module level documentation instead since it also concerns `Hasher` to some extent and not just `Hash`? resolves rust-lang#74215

korken89 added the C-bug Category: This is a bug. label Jul 10, 2020

Mark-Simulacrum added regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 10, 2020

rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 10, 2020

Mark-Simulacrum added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 10, 2020

lcnr removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 15, 2020

tgnottingham mentioned this issue Oct 12, 2020

Unreachable code optimization failure when matching on Rust enum #77812

Closed

pnkfelix changed the title ~~Breaking change/regression in nightly? Hash of Discriminant now produces different results across CPU architectures~~ Hash of Discriminant now produces different results across CPU architectures Nov 6, 2020

Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-bug Category: This is a bug. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Nov 6, 2020

Mark-Simulacrum mentioned this issue Oct 25, 2021

Regression: derive(Hash) no longer constant since 1.45, depends on bit-width and endianness #90230

Closed

infinity0 mentioned this issue Oct 25, 2021

test_cratesio_hash fails on systems that are not little-endian 64-bit rust-lang/cargo#10004

Closed

Mark-Simulacrum added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Oct 25, 2021

rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 25, 2021

hkratz mentioned this issue Oct 26, 2021

Fix test: hash value depends on endianness and bitness. rust-lang/cargo#10011

Merged

m-ou-se added the P-medium Medium priority label Oct 27, 2021

m-ou-se removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 27, 2021

m-ou-se added E-help-wanted Call for participation: Help is requested to fix this issue. and removed regression-from-stable-to-stable Performance or correctness regression from one stable version to another. labels Nov 10, 2021

the8472 mentioned this issue Nov 18, 2021

Document non-guarantees for Hash #90995

Merged

bors closed this as completed in 43279b2 Nov 27, 2021

Hash of Discriminant now produces different results across CPU architectures #74215

Hash of Discriminant now produces different results across CPU architectures #74215

Comments

korken89 commented Jul 10, 2020

kotauskas commented Jul 10, 2020

Uh oh!

nbdd0121 commented Jul 10, 2020

Uh oh!

Mark-Simulacrum commented Jul 10, 2020

Uh oh!

Amanieu commented Jul 10, 2020

Uh oh!

SimonSapin commented Jul 10, 2020

Uh oh!

nbdd0121 commented Jul 10, 2020

Uh oh!

lcnr commented Jul 15, 2020

Uh oh!

SimonSapin commented Jul 15, 2020

Uh oh!

nbdd0121 commented Jul 15, 2020

Uh oh!

kotauskas commented Jul 15, 2020

Uh oh!

SimonSapin commented Jul 15, 2020

Uh oh!

scottmcm commented Jul 15, 2020

Uh oh!

nbdd0121 commented Jul 15, 2020

Uh oh!

SimonSapin commented Jul 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nbdd0121 commented Jul 16, 2020

Uh oh!

SimonSapin commented Jul 16, 2020

Uh oh!

nbdd0121 commented Jul 16, 2020

Uh oh!

SimonSapin commented Jul 16, 2020

Uh oh!

nikomatsakis commented Jul 16, 2020

Uh oh!

SimonSapin commented Jul 17, 2020

Uh oh!

nikomatsakis commented Jul 17, 2020

Uh oh!

scottmcm commented Oct 12, 2020

Uh oh!

Mark-Simulacrum commented Nov 6, 2020

Uh oh!

m-ou-se commented Oct 27, 2021

Uh oh!

joshtriplett commented Nov 10, 2021

Uh oh!

m-ou-se commented Nov 10, 2021

Uh oh!

Hash of `Discriminant` now produces different results across CPU architectures #74215

Hash of `Discriminant` now produces different results across CPU architectures #74215

SimonSapin commented Jul 15, 2020 •

edited

Loading