hashtable local performance breakdowns #536

ThomasWaldmann · 2016-01-09T23:51:03Z

seems like some circumstance leads to local performance breakdowns depending on the amount N (and kind?) of input and the hashtable size T while usually giving good performance.

e.g.:

T = 262147 (initial size, grows when load>0.75)
H1.merge(H2) times:
N=100000 16ms
N=200000 108ms
N=240000 18385ms

Having a smaller max load factor seems to help a bit, but also seems to make the problem just less likely / less intense(?), but does not make it go away. Also, we should not go too low with the max load factor to not waste memory / get out-of-memory conditions.

I am wondering a bit, what causes this / what to do against it.

The hash function is taking the upper 32bits of the sha256 and computes modulo T to get the bucket index. T was even prime in one of my experiments.

Source: borg/_hashindex.c

The text was updated successfully, but these errors were encountered:

ThomasWaldmann · 2016-01-10T01:08:15Z

http://planetmath.org/goodhashtableprimes interesting, esp. property 3:

1.    each number in the list is prime
2.    each number is slightly less than twice the size of the previous
3.    each number is as far as possible from the nearest two powers of two

ThomasWaldmann · 2016-01-10T01:43:18Z

https://github.com/fragglet/c-algorithms includes src/hash-table.c/h implementing separate chaining with linked lists.

good:

little memory needs for the hashtable itself (N * pointersize), so we can afford growing early (author grows table with load >0.33!)

bad:

lots of malloc(), lots of tiny objects
harder/slower to (un)serialize for disk storage
no easy mmap approach (in case we ever want to retry that)

ThomasWaldmann · 2016-01-10T02:10:17Z

http://blogs.msdn.com/b/dcook/archive/2007/09/09/hashes-and-tables-and-primes-oh-my.aspx anti-prime

ThomasWaldmann · 2016-01-10T19:29:39Z

currently, it seems to me that all the "magic" rules and mechanisms are just there to cope with:

bad hash functions that don't distribute well (does not apply as we have a cryptographic hash to start from)
"open addressing" collision handling using linear probing with increment > 1 (does not apply, we use 1)

So the options for us seem to be:

expose load factors, so people with lots of memory can use a low load factor giving high speed and people with tight memory can use a high one and sacrifice speed. the reasonable max load factor seems to be 0.3 .. 0.8.
use a different hashtable implementation with separate chaining

ThomasWaldmann · 2016-03-03T10:19:08Z

Sounds promising:

http://sebastiansylvan.com/post/robin-hood-hashing-should-be-your-default-hash-table-implementation/

http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/

infectormp · 2016-04-27T06:48:39Z

may be out of topic, but maybe we can use jemalloc to improve performance?

ThomasWaldmann · 2016-04-27T12:12:16Z

I didn't try jemalloc yet, but the _hashindex.c code is only calling malloc/calloc if the hashtable is at max. load factor and needs to increase it's size. So it is relatively infrequent (e.g. not like calling malloc per hashtable entry).

ThomasWaldmann · 2016-08-15T22:00:40Z

See #1429 for a promising approach.

ThomasWaldmann · 2016-11-28T22:10:04Z

I moved this to a later milestone, so it does not block beta3.
If we can't adopt this until rc phase, it'll have to wait until 1.2.

enkore · 2017-03-02T18:09:17Z

I wrote some instrumentation code and ran a couple operations and even in trivial cases with small tables I see probing lengths of 40-100. For a larger table (around a million chunks) with a resize-trigger of 32 probed slots I see eight resizes and a maximum probe length of 341 (which means that for a single lookup almost 16 KB of memory were scanned) (for a simple create op that only inserted a couple chunks).

These don't always go away with refilling the table (hence deciding "Oh, this probe length was long! Lets rebuild the table!" does not work - it rebuilds the table CONSTANTLY. Only changing the table size slightly seems to avoid this (num_buckets + 1).

ThomasWaldmann · 2017-03-02T20:35:35Z

@enkore you measured the current hashtable implementation from master? Would be interesting to do same for robin hood hashing. Somehow strange, @rciorba's last benchmarks didn't show a big advantage for "get" ops, but in theory RH should not have such long probe lengths.

rciorba · 2017-03-03T10:15:36Z

@ThomasWaldmann One mitigating factor is that test_chunk_indexer_c_getitem never has to deal with deletions, therefore no tombstones are touched in that benchmark. Hence the smaller benefit. If you look at the setitem_after_deletion benchmark (which deleted 1/5 of keys then benchmarks updating the values of all remaining keys) you'll see a big improvement for RH.

ThomasWaldmann · 2017-03-12T16:11:19Z

btw, one major performance issue related to the hashtable filling up (completely) with tombstones (deleted entries) was recently fixed. another constant MAX_EFF_LOAD 0.93 was introduced (considering used + deleted entries, not just used entries like MAX_LOAD 0.75 does).

maybe this fixes some of the hashtable and merge performance issues seen in practice.
while merging H2 into H1 doesn't copy over tombstones from H2 (thus, eradicating them in result), if we start from a existing hashtable for H1, the tombstones of H1 were still there.

above mentioned fix will rebuild the hashtable once MAX_EFF_LOAD is exceeded.

ThomasWaldmann · 2017-07-04T22:46:19Z

I am closing this for now. As mentioned in previous comment, one major issue (that maybe was the cause of these sudden perf breakdowns) was fixed.

If something else shows up, we'll open a fresh ticket.

ThomasWaldmann added the question label Jan 9, 2016

ThomasWaldmann mentioned this issue Aug 2, 2016

WIP: Robin Hood hashing #1429

Closed

ThomasWaldmann added the enhancement label Aug 2, 2016

ThomasWaldmann added this to the 1.1rc1 milestone Aug 15, 2016

ThomasWaldmann modified the milestones: 1.1.0b3, 1.1rc1 Sep 16, 2016

enkore mentioned this issue Oct 16, 2016

Improve cache sync speed #1729

Closed

ThomasWaldmann modified the milestones: 1.1 - near future goals, 1.1.0b3 Nov 28, 2016

ThomasWaldmann closed this as completed Jul 4, 2017

enkore added the c: hashindex label Jul 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hashtable local performance breakdowns #536

hashtable local performance breakdowns #536

ThomasWaldmann commented Jan 9, 2016

ThomasWaldmann commented Jan 10, 2016

ThomasWaldmann commented Jan 10, 2016

ThomasWaldmann commented Jan 10, 2016

ThomasWaldmann commented Jan 10, 2016

ThomasWaldmann commented Mar 3, 2016

infectormp commented Apr 27, 2016

ThomasWaldmann commented Apr 27, 2016

ThomasWaldmann commented Aug 15, 2016

ThomasWaldmann commented Nov 28, 2016

enkore commented Mar 2, 2017 •

edited

Loading

ThomasWaldmann commented Mar 2, 2017

rciorba commented Mar 3, 2017

ThomasWaldmann commented Mar 12, 2017

ThomasWaldmann commented Jul 4, 2017

hashtable local performance breakdowns #536

hashtable local performance breakdowns #536

Comments

ThomasWaldmann commented Jan 9, 2016

ThomasWaldmann commented Jan 10, 2016

ThomasWaldmann commented Jan 10, 2016

ThomasWaldmann commented Jan 10, 2016

ThomasWaldmann commented Jan 10, 2016

ThomasWaldmann commented Mar 3, 2016

infectormp commented Apr 27, 2016

ThomasWaldmann commented Apr 27, 2016

ThomasWaldmann commented Aug 15, 2016

ThomasWaldmann commented Nov 28, 2016

enkore commented Mar 2, 2017 • edited Loading

ThomasWaldmann commented Mar 2, 2017

rciorba commented Mar 3, 2017

ThomasWaldmann commented Mar 12, 2017

ThomasWaldmann commented Jul 4, 2017

enkore commented Mar 2, 2017 •

edited

Loading