Closed
Description
Related to #6136. The hash function for int64
/uint64
cast the value to a uint32
which means that it is easy to create hash collision for values that are multiples of 232 (4294967296). This could potentially be exploited since the behaviour is completely predictable.
Example
import sets, times
template bench(msg: string, body: untyped) =
var t = cpuTime()
echo msg
body
echo cpuTime() - t
const limit = 100000
let uint32range: int64 = uint32.high.int64 + 1 # number of possible uint32 values
block:
var s1 = initSet[int64]()
bench("set of numbers"):
var hc = uint32range
for i in 0'i64 ..< limit:
s1.incl i
hc += uint32range
bench("membership testing"):
hc = uint32range
for i in 0'i64 ..< limit:
doAssert i in s1
hc += uint32range
block:
var s2 = initSet[int64]()
bench("hash collision numbers"):
var hc = uint32range
for i in 0'i64 ..< limit:
s2.incl hc
hc += uint32range
bench("membership testing"):
hc = uint32range
for i in 0'i64 ..< limit:
doAssert hc in s2
hc += uint32range
Current Output
set of numbers
0.010961
membership testing
0.000144
hash collision numbers
8.10046
membership testing
4.060960999999999
Expected Output
The same timings for any set of values.
Possible Solution
As a workaround, 64-bit values could be hashed for the upper and lower 32-bits separately. In the long run, Nim needs a safer and faster hash function (discussed in #6136). https://github.com/rurban/smhasher is a good resource. I don't use rust, but the hash implementation in rust hashes all bytes in integers individually.
Hash sets and tables should also be seeded with a random value (per process, or even per set/table).
Additional Information
Metadata
Metadata
Assignees
Labels
No labels