Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictable hash collisions for int64/uint64 values #10097

Closed
mjoud opened this issue Dec 26, 2018 · 1 comment
Closed

Predictable hash collisions for int64/uint64 values #10097

mjoud opened this issue Dec 26, 2018 · 1 comment

Comments

@mjoud
Copy link

mjoud commented Dec 26, 2018

Related to #6136. The hash function for int64/uint64 cast the value to a uint32 which means that it is easy to create hash collision for values that are multiples of 232 (4294967296). This could potentially be exploited since the behaviour is completely predictable.

Example

import sets, times

template bench(msg: string, body: untyped) =
  var t = cpuTime()
  echo msg
  body
  echo cpuTime() - t

const limit = 100000
let uint32range: int64 = uint32.high.int64 + 1  # number of possible uint32 values


block:
  var s1 = initSet[int64]()
  bench("set of numbers"):
    var hc = uint32range
    for i in 0'i64 ..< limit:
      s1.incl i
      hc += uint32range

  bench("membership testing"):
    hc = uint32range
    for i in 0'i64 ..< limit:
      doAssert i in s1
      hc += uint32range

block:
  var s2 = initSet[int64]()
  bench("hash collision numbers"):
    var hc = uint32range
    for i in 0'i64 ..< limit:
      s2.incl hc
      hc += uint32range

  bench("membership testing"):
    hc = uint32range
    for i in 0'i64 ..< limit:
      doAssert hc in s2
      hc += uint32range

Current Output

set of numbers
0.010961
membership testing
0.000144
hash collision numbers
8.10046
membership testing
4.060960999999999

Expected Output

The same timings for any set of values.

Possible Solution

As a workaround, 64-bit values could be hashed for the upper and lower 32-bits separately. In the long run, Nim needs a safer and faster hash function (discussed in #6136). https://github.com/rurban/smhasher is a good resource. I don't use rust, but the hash implementation in rust hashes all bytes in integers individually.

Hash sets and tables should also be seeded with a random value (per process, or even per set/table).

Additional Information

@Araq
Copy link
Member

Araq commented Dec 27, 2018

This could potentially be exploited since the behaviour is completely predictable.

Hashing can always be "Potentially be exploited", esp since weaknesses may simply be still unknown. The solution is to use BTrees or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants