Invalid Hash implementation #142

orlp · 2023-10-02T11:49:47Z

Currently the Hash implementation just assumes that the FloatCore type has a mantissa/exponent width like f64. This is not necessarily true, one could have a valid custom FloatCore type that does not fit these constraints.

Furthermore, the hash implementation is rather slow, because it reconstructs the type from integer_decode (https://rust.godbolt.org/z/hTTqnaso6). That's 10-15 cycles every float for what should be a no-op - this can be more expensive than the hash function itself!

I propose we add a StdFloat sealed trait that is implemented for f32 and f64 only, and use that as the bound for the Hash implementation. Then the hash implementation can use .to_bits().

Unfortunately, this is a breaking change. However I must reiterate that the current implementation is invalid for the FloatCore trait bound, it's possible to make a perfectly sane type implementing FloatCore which would violate the Eq/Hash parity when used in OrderedFloat due to no fault of the original type.

An alternative solution is to feed the sign, mantissa and exponent returned by f.integer_decode() separately into the hasher. However this would make the hash implementation slower still.

The text was updated successfully, but these errors were encountered:

mbrubeck · 2023-10-02T15:19:43Z

Currently the Hash implementation just assumes that the FloatCore type has a mantissa/exponent width like f64. This is not necessarily true, one could have a valid custom FloatCore type that does not fit these constraints.

Hmm. The FloatCore::integer_decode docs require that it “returns the mantissa, base 2 exponent, and sign as integers, respectively. The original number can be recovered by sign * mantissa * 2 ^ exponent.”

We throw away the top few bits of the mantissa and exponent, which could be used in some custom implementations. This may increase collisions, but should not cause the Hash implementation to be incorrect.

Can you give an example of a correct FloatCore implementation that leads to an invalid Hash implementation?

I propose we add a StdFloat sealed trait that is implemented for f32 and f64 only, and use that as the bound for the Hash implementation. Then the hash implementation can use .to_bits().

Or if necessary we could implement Hash directly just for OrderedFloat<f32>, OrderedFloat<f64>, NotNan<f32>, and NotNan<f64>.

Furthermore, the hash implementation is rather slow

Ouch, this is unfortunate. I wish we had specialization so we could easily fix this without a breaking change...

mbrubeck · 2023-10-02T15:37:40Z

Can you give an example of a correct FloatCore implementation that leads to an invalid Hash implementation?

Ah, I guess one possibility is that the mantissa is not normalized, so that the number 2 (for example) could be represented in multiple ways like (1, 1, 1) and (2, 0, 1) and (4, -1, 1).

mbrubeck · 2023-10-02T19:00:09Z

I believe the generic implementation could be fixed by adding a normalization step, but this would make it even slower. In the absence of specialization, I am leaning toward your proposal of implementing Hash only for the primitive types, using to_bits. Feel free to submit a pull request, or I can do it when I have time.

I think the large majority of users only care about the primitive types, so performance in this common case is probably more important than support for very rare use cases. If there is demand for Hash for custom float types, we can add a way to opt in to that later.

orlp · 2023-10-02T19:28:45Z

We throw away the top few bits of the mantissa and exponent, which could be used in some custom implementations. This may increase collisions, but should not cause the Hash implementation to be incorrect.

You are right, extra collisions technically don't make it incorrect. Although I do such consider systematic collisions problematic. So my initial statement was overstated.

Ah, I guess one possibility is that the mantissa is not normalized, so that the number 2 (for example) could be represented in multiple ways like (1, 1, 1) and (2, 0, 1) and (4, -1, 1).

That said, you are right, this once again does make it technically incorrect.

Ouch, this is unfortunate. I wish we had specialization so we could easily fix this without a breaking change...

So do I... so do I...

Or if necessary we could implement Hash directly just for OrderedFloat<f32>, OrderedFloat<f64>, NotNan<f32>, and NotNan<f64>.

My biggest concern with this is that it breaks generic code of users in a way that's more annoying to fix. At least with a StdFloat trait the code can be made generic again relatively easy by adding a StdFloat bound.

mbrubeck · 2023-12-06T22:40:39Z

My current plan is to adopt your suggested fix, starting with ordered-float 5.0.0.

I’m waiting a bit to fix this in case there are other breaking changes we can bundle in the same release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid Hash implementation #142

Invalid Hash implementation #142

orlp commented Oct 2, 2023 •

edited

Loading

mbrubeck commented Oct 2, 2023 •

edited

Loading

mbrubeck commented Oct 2, 2023 •

edited

Loading

mbrubeck commented Oct 2, 2023 •

edited

Loading

orlp commented Oct 2, 2023

mbrubeck commented Dec 6, 2023

Invalid Hash implementation #142

Invalid Hash implementation #142

Comments

orlp commented Oct 2, 2023 • edited Loading

mbrubeck commented Oct 2, 2023 • edited Loading

mbrubeck commented Oct 2, 2023 • edited Loading

mbrubeck commented Oct 2, 2023 • edited Loading

orlp commented Oct 2, 2023

mbrubeck commented Dec 6, 2023

orlp commented Oct 2, 2023 •

edited

Loading

mbrubeck commented Oct 2, 2023 •

edited

Loading

mbrubeck commented Oct 2, 2023 •

edited

Loading

mbrubeck commented Oct 2, 2023 •

edited

Loading