-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash of Discriminant
now produces different results across CPU architectures
#74215
Comments
I don't think that the |
|
Tagging as a @rust-lang/libs regression -- I suspect that this is definitely outside the scope of "intentional coverage" of our stability policy, but we could probably change the Hash impl to always hash the ~same number of bytes. Regardless I don't think Rust guarantees the same value in the discriminant when compiling more than once (much less for different targets), so in that sense it's not really covered either. |
Either way we should specifically mention the stability guarantees (or lack thereof) in the |
I’m a bit confused because the issue description mentions two rustc versions. Do you get different results on different architectures with the same compiler version? Does |
@SimonSapin No, rust/src/librustc_middle/ty/mod.rs Lines 2179 to 2181 in e59b08e
|
A quick summary of the problem: This breaks code which relies on (I think we still don't guarantee that enums with unspecified repr have the same discriminant when compiling for a different target) Not really part of |
The root of the issue seems to be that
|
I believe enum Eg {
A = 0x8000_0000, // "error: literal out of range for isize" on 32-bit platforms
} If we make the discriminant same size as #[non_exhaustive]
enum SomeEnum {
A = 0,
B = 255,
C, // Discriminant is `u8` without this, `u16` with this. So it will break hashing as well.
} |
Still, I don't think that code should rely on that at all. We do need a mention of this consideration in the documentation of the |
That changing the source definition of a type without a |
I see lang was pinged here, but it's not clear to me that this is a lang issue. AFAIK (Personally it seems like there's no guarantee that |
If we change the discriminant away from
|
The "the
We could change the |
What about enums with only 1 variant? enum Enum {
A = 256
}
Similar argument applies to enum with tuple/struct variants. For things like Do we need to have another set of logic to compute the smallest integer type necessary to hold the discriminant, which would not be essentially the same as the size of the enum after layout optimisation? My stance is no, we'd better keep it simple and leave it as |
We talk about "the discriminant" but that can refer to at least four different things that are not necessarily the same:
When it exists, 2. is defined to be the same as 1. Other than that each of these four things may be more or less related to the others for ease of implementation but they don’t have to. Enums without a enum SomeEnum { A, B = 255 }
enum OtherEnum { A, B = 256 }
enum Single { B = 256 }
fn main() {
dbg!(std::mem::size_of::<SomeEnum>());
dbg!(std::mem::size_of::<OtherEnum>());
dbg!(std::mem::size_of::<Single>());
} [src/main.rs:5] std::mem::size_of::<SomeEnum>() = 1
[src/main.rs:6] std::mem::size_of::<OtherEnum>() = 2
[src/main.rs:7] std::mem::size_of::<Single>() = 0 These optimizations could go further and for example make The above also shows that the compiler already has logic for finding the smallest integer type that fits a set of values (although it’s not the only thing that determines the memory representation). |
We could do it, but I am still not convinced that we should do it. As you mentioned, 1. 2. and 4. are different things already, and changing After all, we provide no guarantee that the memory layout optimisations will happen (except for |
Trying to reduce de-facto stability of unspecified behavior does not seem to me like a good reason by itself to artificially introduce/preserve target-dependence. Consistency between 1. and 3. for users is a more reasonable argument to me, although (I wish the default type for 1. was a target-independent one like |
Hmm, so: It seems clear that the It also seems like a reasonable workaround exists. As @lcnr said, users can write a hasher that routes If we did however want to make hashing enums independent from the target architecture (at least, those that don't explicitly use |
This logic is already implemented in the compiler. When the default memory representation of an enum uses a tag, the tag has the representation of the smallest integer type that works. |
@SimonSapin yes, that logic is implemented, but it is implemented as part of the |
Is this worse in some way than the The error on using That makes me curious about things like pub enum Foo {
Bar = 12345,
Baz = 1 + Foo::Bar as isize,
} But the MIR for
So I guess that potential issue has already been avoided.
Hmm, I guess I wasn't thinking about the Right now I think the only stable way to get the value out is with This issue is now old enough that it's a de-facto decision that the breakage isn't important enough to just do the Meta: It'd also be a shame to end up with folk wisdom of "you should |
Discriminant
now produces different results across CPU architecturesDiscriminant
now produces different results across CPU architectures
Untagging as a regression, since this landed in 1.45. |
cc @rust-lang/libs-api |
We discussed this in today's @rust-lang/libs meeting, and we feel like the only remaining action here is to document this behavior. |
(Any proposals for providing more guarantees about this can be proposed and discussed separately.) |
Dependence on endianness and type sizes was reported for enum discriminants in rust-lang#74215 but it is a more general issue since for example the default implementation of `Hasher::write_usize` uses native endianness. Additionally the implementations of library types are occasionally changed as their internal fields change or hashing gets optimized.
Document non-guarantees for Hash Dependence on endianness and type sizes was reported for enum discriminants in rust-lang#74215 but it is a more general issue since for example the default implementation of `Hasher::write_usize` uses native endianness. Additionally the implementations of library types are occasionally changed as their internal fields change or hashing gets optimized. ## Question Should this go on the module level documentation instead since it also concerns `Hasher` to some extent and not just `Hash`? resolves rust-lang#74215
Hi, we just noticed that there has been changes to the inner of
Discriminant
, i.e.DiscriminantKind
,that causes a
Hash
of a discriminant to produce different results across CPU architectures.E.g.
When hashing this in the current stable one gets writes that look like this:
And depending on the architecture in the current nightly one gets for
armv7-unknown-linux-gnueabihf
:However for ex
x86_64-unknown-linux-gnu
it is still 8 bytes as before.This is a problem as e.g. CRC hashes are no longer compatible between CPU architectures.
Not sure if this was intended, but it seems like a breaking change to me?
In our case we are sending structures between an embedded
thumbv7em-none-eabihf
and anarmv7-unknown-linux-gnueabihf
system where CRCs are calculated based on#[derive(Hash)]
for said structures to be sent.
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: