-
-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hashtable local performance breakdowns #536
Comments
http://planetmath.org/goodhashtableprimes interesting, esp. property 3:
|
https://github.com/fragglet/c-algorithms includes src/hash-table.c/h implementing separate chaining with linked lists. good:
bad:
|
currently, it seems to me that all the "magic" rules and mechanisms are just there to cope with:
So the options for us seem to be:
|
may be out of topic, but maybe we can use jemalloc to improve performance? |
I didn't try jemalloc yet, but the _hashindex.c code is only calling malloc/calloc if the hashtable is at max. load factor and needs to increase it's size. So it is relatively infrequent (e.g. not like calling malloc per hashtable entry). |
See #1429 for a promising approach. |
I moved this to a later milestone, so it does not block beta3. |
I wrote some instrumentation code and ran a couple operations and even in trivial cases with small tables I see probing lengths of 40-100. For a larger table (around a million chunks) with a resize-trigger of 32 probed slots I see eight resizes and a maximum probe length of 341 (which means that for a single lookup almost 16 KB of memory were scanned) (for a simple create op that only inserted a couple chunks). These don't always go away with refilling the table (hence deciding "Oh, this probe length was long! Lets rebuild the table!" does not work - it rebuilds the table CONSTANTLY. Only changing the table size slightly seems to avoid this (num_buckets + 1). |
@ThomasWaldmann One mitigating factor is that test_chunk_indexer_c_getitem never has to deal with deletions, therefore no tombstones are touched in that benchmark. Hence the smaller benefit. If you look at the setitem_after_deletion benchmark (which deleted 1/5 of keys then benchmarks updating the values of all remaining keys) you'll see a big improvement for RH. |
btw, one major performance issue related to the hashtable filling up (completely) with tombstones (deleted entries) was recently fixed. another constant MAX_EFF_LOAD 0.93 was introduced (considering used + deleted entries, not just used entries like MAX_LOAD 0.75 does). maybe this fixes some of the hashtable and merge performance issues seen in practice. above mentioned fix will rebuild the hashtable once MAX_EFF_LOAD is exceeded. |
I am closing this for now. As mentioned in previous comment, one major issue (that maybe was the cause of these sudden perf breakdowns) was fixed. If something else shows up, we'll open a fresh ticket. |
seems like some circumstance leads to local performance breakdowns depending on the amount N (and kind?) of input and the hashtable size T while usually giving good performance.
e.g.:
Having a smaller max load factor seems to help a bit, but also seems to make the problem just less likely / less intense(?), but does not make it go away. Also, we should not go too low with the max load factor to not waste memory / get out-of-memory conditions.
I am wondering a bit, what causes this / what to do against it.
The hash function is taking the upper 32bits of the sha256 and computes modulo T to get the bucket index. T was even prime in one of my experiments.
Source: borg/_hashindex.c
The text was updated successfully, but these errors were encountered: