Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster hashing #11203

Merged
merged 9 commits into from
May 21, 2019
Merged

faster hashing #11203

merged 9 commits into from
May 21, 2019

Conversation

narimiran
Copy link
Member

  • multibyte hashing for:
    • string and string slices
    • cstring
    • string, ignoring case
    • string, ignoring style
    • openArray of byte or char

@@ -136,6 +139,17 @@ proc hash*[T: Ordinal](x: T): Hash {.inline.} =
## Efficient hashing of other ordinal types (e.g. enums).
result = ord(x)

template multibyteHashImpl(result: Hash, x: typed, start, stop: int) =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some results on my machine (i7-970) for some basic tests I did. These are average times, but there is still variance so single-digit percent difference should be taken as zero.

test old algo (ms) new algo (ms) difference
10_000 English words 0.731 0.750 +2.6%
20_000 English words 1.517 1.563 +3.0%
50_000 English words 4.559 4.678 +2.6%
--- --- --- ---
20_000 random 4-char strings 0.497 0.503 +1.2%
20_000 random 8-char strings 0.775 1.077 +38.9% ??
20_000 random 9-char strings 0.867 0.614 -29.1%
20_000 random 12-char strings 0.952 0.687 -27.8%
20_000 random 16-char strings 1.162 0.592 -49.0%
20_000 random 24-char strings 1.504 0.600 -60.1%
20_000 random 32-char strings 1.860 0.656 -64.7%
20_000 random 64-char strings 3.274 0.851 -74.0%

As the length of string/array/seq increases, so does the performance of the new algorithm.

The only exception is length of 8, which is unexpectedly slow (the first while loop is called once, the second one is called zero times). Increasing the length to 9 (first loop once, second once) has a speed-up which would be expected for length of 8 too.

I've tried several different ways to write the first loop, but all had the similar results.

@mratsim do you maybe know what is going on here? Any tips?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share your benchmarking script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share your benchmarking script?

It was just a quick/simple "hash a word, add a hash to set, do something with that set (don't time it) just in case", repeat that dozens of time, and report the average:

proc hashTesting(wordlist: seq[string], f: proc (x: string): Hash, name: string): float =
  let
    amount = 20_000
    repetitions = 48
  var
    collisions = newSeq[int](repetitions)
    highest = newSeq[int](repetitions)
    times = newSeq[float](repetitions)

  for i in 0 ..< repetitions:
    let words = wordList[amount*i ..< amount*i+amount]
    var s = initHashSet[int](sets.rightSize(amount))
    var t = cpuTime()
    for w in words:
      s.incl f(w)
    t = cpuTime() - t
    times[i] = t
    collisions[i] = amount - s.len
    highest[i] = s.toSeq.max
  echo "--"
  echo name
  echo highest.min
  echo collisions.sum / collisions.len
  echo "max: ", times.max * 1000.0
  echo "min: ", times.min * 1000.0
  return times.sum / times.len.float * 1000.0

where f: proc (x: string): Hash is either the old one or the new one (or several others that I've tested).

Copy link
Member

@timotheecour timotheecour Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only exception is length of 8, which is unexpectedly slow

this is explained here: #11581 (and it's not just length 8)

* multibyte hashing for:
  * string and string slices
  * cstring
  * string, ignoring case
  * string, ignoring style
  * openArray of byte or char
@mratsim mratsim self-assigned this May 9, 2019
mratsim
mratsim previously requested changes May 9, 2019
Copy link
Collaborator

@mratsim mratsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the perf that I will check, I found a couple issues in the current and new hashes:

template multibyteHashImpl(result: Hash, x: typed, start, stop: int) =
var i = start
while i <= stop+1 - IntSize:
let n = cast[ptr int](unsafeAddr x[i])[]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a fallback for hashing strings in the VM and a test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also while i < stop - IntSize: is cleaner

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also while i < stop - IntSize: is cleaner

You almost had me there ;)

Your version is true only when stop >= 9 or higher (length of 10), mine is when stop >= 7 (length of 8).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh off-by-one :/

lowerString = newStringOfCap(len(x))
i = 0
while i <= high(x):
lowerString.addLowercaseChar(x[i])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this crashes if x = ""

while i < x.len: wouldn't crash

A test case for empty strings would be nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, high for empty strings produces -1, not 0.

lowerString = newStringOfCap(remainingLength)
i = sPos
while i <= ePos:
lowerString.addLowercaseChar(sBuf[i])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If string is empty and we pass sPos = ePos = 0, this probably crash.
Test case as well.

(Original code probably suffer from the same issue).

@@ -282,8 +280,12 @@ proc hash*[T: tuple](x: T): Hash =

proc hash*[A](x: openArray[A]): Hash =
## Efficient hashing of arrays and sequences.
for it in items(x): result = result !& hash(it)
result = !$result
when A is char|byte:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do we add uint8 as well?

  • Do we add other integers? And floats?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint8 is an alias for byte.

Copy link
Collaborator

@mratsim mratsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since high("") returns -1, this shouldn't crash, I'd like some empty string tests though to avoid unlucky refactors.

I'll check the perf later today.

@mratsim mratsim self-requested a review May 9, 2019 14:26
Copy link
Collaborator

@mratsim mratsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@mratsim mratsim self-requested a review May 9, 2019 14:28
@mratsim mratsim dismissed their stale review May 9, 2019 14:30

Addressed

@mratsim
Copy link
Collaborator

mratsim commented May 10, 2019

I can't reproduce your outlier on my machine, for 20000 8-char random string the new implementation is about 25% faster, script here: https://gist.github.com/mratsim/e5a2d1d74adc2763ab7b080a7c40ef1e.

So for me we only need a fallback path for the VM.

@narimiran
Copy link
Member Author

I can't reproduce your outlier on my machine, for 20000 8-char random string the new implementation is about 25% faster

Thanks for testing it! And now with your version of the script I also get the speed improvements for the length of 8.

So for me we only need a fallback path for the VM.

What do you recommend? How would that fallback look like? Use the old version for VM?

@mratsim
Copy link
Collaborator

mratsim commented May 10, 2019

when nimvm: old version sounds good. Probably with a warning that hashes at compile-time differs from hashes at runtime (it's already true for ref objects anyway).

* use optimized version for all ints
* add more tests
* make it work in VM
* put warnings about differences between CT and runtime
* minor style tweaks
@narimiran narimiran force-pushed the faster-hashes-redux branch 4 times, most recently from db04f15 to d886d35 Compare May 12, 2019 08:40
@narimiran
Copy link
Member Author

narimiran commented May 15, 2019

There is a showstopper breakage: now that VM hashes are different from runtime hashes, you cannot read from a const Table anymore :(
EDIT: Fixed in the next commit.

Btw, here are some more benchmark results, now that I made a version that works on all integers. This is hashing of 100_000 containers with X elements in each.

int8

elements old algo new algo old/new
3 3.01 2.99 1.00x
4 3.07 3.19 0.94x
7 3.85 3.89 0.77x
8 4.05 2.56 1.56x
12 4.88 3.70 1.08x
16 5.98 3.07 1.62x
31 9.63 4.95 1.81x
32 9.64 3.60 2.50x
63 17.36 6.44 2.65x
64 17.32 5.33 3.20x

int16

elements old algo new algo old/new
3 2.76 3.11 0.64x
4 3.02 2.61 1.14x
7 3.76 3.29 0.91x
8 3.87 2.83 1.06x
12 4.90 3.28 1.21x
16 6.01 3.59 1.67x
31 9.66 5.43 1.65x
32 9.89 5.11 1.76x
63 17.57 8.42 2.02x
64 17.77 8.02 2.12x

int32

elements old algo new algo old/new
3 2.91 3.15 0.63x
4 3.29 2.98 1.00x
7 4.26 3.61 1.10x
8 4.34 3.63 1.10x
12 5.40 4.57 1.09x
16 7.16 5.24 1.33x
31 11.07 8.11 1.35x
32 11.15 8.15 1.35x
63 19.41 13.5 1.40x
64 19.51 13.5 1.40x

@juancarlospaco
Copy link
Collaborator

Have you seen Meow ❔
https://mollyrocket.com/meowhash
Theres implementation on C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants