Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equal floats and ints have different hashes #2818

Closed
oprypin opened this issue Jun 12, 2016 · 3 comments
Closed

Equal floats and ints have different hashes #2818

oprypin opened this issue Jun 12, 2016 · 3 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib

Comments

@oprypin
Copy link
Member

oprypin commented Jun 12, 2016

First, the immediate problem caused by this fact:

{0 => :a, 0.0 => :b, 1 => :c, 1.0 => :d}
# => {0 => :b, 1 => :c, 1.0 => :d}

Because:

0 == 0.0  &&  1 == 1.0
# => true
{0.hash, 0.0.hash}
# => {0, 0}
{1.hash, 1.0.hash}
# => {1, 4607182418800017408}

(there don't seem to be any other such "hash collisions" other than 0 but my search wasn't exhaustive, and this is still dangerous).

In case this wasn't clear, the problem is the definition of hashes is violated – equal values must have equal hashes.

I see a few general approaches to solving this:

  1. Like Ruby: Array#uniq considers each element eql? method to check if exists #2809 (comment)
    Introduce a "hash equality" concept and explicitly mark Ints and Floats to be not equal in the hashing context.
    (The example above would have 4 pairs in the end)
  2. Variation on the above:
    Make different types never be equal in the hashing context. How to determine whether types are "same" or "different"? Again, a few suggestions:
    1. It's the exact same type
    2. The types have the same hash method (same type or inherited).
  3. Like Python: Array#uniq considers each element eql? method to check if exists #2809 (comment)
    Set up the hashing functions in a way that Ints and Floats have the same hash.
    (The example above would have 2 pairs in the end)
    Theoretically we could take their algorithm but not sure about licensing.
    To elaborate what Python does:
>>> hash(234), hash(234.0), hash(Decimal('234.0'))
(234, 234, 234)
>>> hash(0.5), hash(Fraction(1, 2)), hash(Decimal('0.5'))
(1152921504606846976, 1152921504606846976, 1152921504606846976)
>>> {234: 1, 234.0: 2, Decimal('234.0'): 3}
{234: 3}
@jhass jhass added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib labels Jun 12, 2016
@david50407
Copy link
Contributor

+1 for ruby-like solution. But it's fine now, C# and Java also make the hash of 0 and 0.0 the same.

IMO, Crystal gets everything to be objects, so the hash should be different when these two items having different type even their data is the same in the memory at least, so 0's hash will never equals to 0.0's, cause their types are not the same at all.

@ozra
Copy link
Contributor

ozra commented Jun 21, 2016

Without diving in to the problem, at a glance I'd say it's not a problem with hashing (duplicates are expected - it's not meant to be perfect hashing, just fast and distribute over buckets and similar uses).
The problem with the hash-map must be because 0 == 0.0 - so this instead puts some more fuel on: [edited...] #567 (comment) (don't allow float <=> int promotions) implementation of comparison between ints and floats. But this was a shot from the hip... Just comparing floats to "should be the same" floats itself is inexact business...

@akzhan
Copy link
Contributor

akzhan commented Nov 30, 2017

I think that it should be closed because #5276 has been merged. Python way chosen.

@oprypin oprypin closed this as completed Nov 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib
Projects
None yet
Development

No branches or pull requests

5 participants