Skip to content

Predictable hash collisions for int64/uint64 values #10097

Closed
@mjoud

Description

@mjoud

Related to #6136. The hash function for int64/uint64 cast the value to a uint32 which means that it is easy to create hash collision for values that are multiples of 232 (4294967296). This could potentially be exploited since the behaviour is completely predictable.

Example

import sets, times

template bench(msg: string, body: untyped) =
  var t = cpuTime()
  echo msg
  body
  echo cpuTime() - t

const limit = 100000
let uint32range: int64 = uint32.high.int64 + 1  # number of possible uint32 values


block:
  var s1 = initSet[int64]()
  bench("set of numbers"):
    var hc = uint32range
    for i in 0'i64 ..< limit:
      s1.incl i
      hc += uint32range

  bench("membership testing"):
    hc = uint32range
    for i in 0'i64 ..< limit:
      doAssert i in s1
      hc += uint32range

block:
  var s2 = initSet[int64]()
  bench("hash collision numbers"):
    var hc = uint32range
    for i in 0'i64 ..< limit:
      s2.incl hc
      hc += uint32range

  bench("membership testing"):
    hc = uint32range
    for i in 0'i64 ..< limit:
      doAssert hc in s2
      hc += uint32range

Current Output

set of numbers
0.010961
membership testing
0.000144
hash collision numbers
8.10046
membership testing
4.060960999999999

Expected Output

The same timings for any set of values.

Possible Solution

As a workaround, 64-bit values could be hashed for the upper and lower 32-bits separately. In the long run, Nim needs a safer and faster hash function (discussed in #6136). https://github.com/rurban/smhasher is a good resource. I don't use rust, but the hash implementation in rust hashes all bytes in integers individually.

Hash sets and tables should also be seeded with a random value (per process, or even per set/table).

Additional Information

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions