WIP: Robin Hood hashing #1429

rciorba · 2016-08-02T18:19:50Z

This changes the collision handling strategy in _hashindex.c to
robin-hood hashing. Incidentally this should mutually be compatible
with the old version.

This hasn't been properly tested yet, except by the unittests, but
I'm sharing this early to get some feedback. Some testing and a
before and after performance measurement will follow.

Also please note this temporarily disables a couple of tests that
hard coded hashes that now no longer match.

This changes the collision handling strategy in _hashindex.c to robin-hood hashing. Incidentally this should be mutually compatible with the old version. This hasn't been properly tested yet, except by the unittests, but I'm sharing this early to get some feedback. Some testing and a before-and-after performance measurement will follow.

The 2 failing tests seem to have a hardcoded hash that no longer matches the repo hash once the robin-hood collision handling has been implemented. registry.RemoteRepositoryCheckTestCase setUp will fail if any selftest fails thus preventing a bunch of tests from running. Obviously this commit is not intended to be merged, but it's here to allow the rest of the tests to run, to show that the proposed change doesn't break borg's tests in fundamental ways.

PlasmaPower · 2016-08-02T18:52:46Z

src/borg/_hashindex.c

@@ -68,7 +68,7 @@ static int hash_sizes[] = {
 };

 #define HASH_MIN_LOAD .25
-#define HASH_MAX_LOAD .75  /* don't go higher than 0.75, otherwise performance severely suffers! */
+#define HASH_MAX_LOAD 0.95  /* don't go higher than 0.75, otherwise performance severely suffers! */


Why is this changed? Also, make sure to update the comment.

Minor nitpicking, don't put a leading zero if the definition above doesn't have it.

Why is this changed? Also, make sure to update the comment.

The point of the Robin Hood hashing is to minimize the worst case for collisions by spreading the pain across all addresses. This should allow high loads in the hash table without performance degrading much. Also I should add the idea for this change isn't mine, @ThomasWaldmann suggested it as something interesting to do at the EuroPython sprints.

I intentionally didn't update the comments until I run some benchmarks to find an appropriate value.

Minor nitpicking, don't put a leading zero if the definition above doesn't have it.

Will do. BTW, the code style for C in this project isn't 100% clear to me, so If there are any other style no-no's in my PR, please let me know.

Thanks for working on this! :)
If you change HASH_MAX_LOAD, do a full text search for it, there is another place depending on its value.

If you change HASH_MAX_LOAD, do a full text search for it, there is another place depending on its value.

Had, a look. All I can see is the comment next to the value and docs/internals.rst.
I would update both once I identify a good value for this constant. Let me know if there's any places I've missed.

search for 1.35 in the source.

rciorba · 2016-08-02T20:35:42Z

Thanks for the feedback so far. I'll follow up once I get some performance numbers in as well.

ThomasWaldmann · 2016-08-02T20:38:40Z

src/borg/_hashindex.c

+            /* we have a collision */
+            other_offset = distance(
+                index, idx, hashindex_index(index, bucket_ptr));
+            if ( other_offset < offset) {


nitpick: no blank after (

ThomasWaldmann · 2016-08-02T21:27:37Z

Really looking forward to the effects of this CS. The paper sounded very promising about performance / efficiency.

ThomasWaldmann · 2016-08-02T21:32:42Z

Related: #536

ThomasWaldmann · 2016-08-04T07:58:24Z

@rciorba can you do a separate fixup commit with fixes for all the feedback you got?

rciorba · 2016-08-04T08:44:24Z

Sure thing. I'll have some time later today.

rciorba · 2016-08-04T21:58:41Z

src/borg/_hashindex.c

@@ -111,7 +111,7 @@ hashindex_index(HashIndex *index, const void *key)
 static int
 hashindex_lookup(HashIndex *index, const void *key)


This function could also be optimized. Currently for the worst case scenario (not finding a key in the index) we scan all buckets until we find an empty one. At high fill ratios this might get close to O(N). However if we track the maximum offset in the entire hashmap we could bail after at most max_offset iterations.

As the PR is currently, we could just load an old hashmap and start operating on it with the new collision handling code, and it would just work, and also hashmaps created by this would still be usable by olded borg versions. Changing hashindex_lookup however would require us to convert the hashmap explicitly, and also change the HashHeader to track this max offset. That would be a bigger deal because it would impact backwards compatibility, so some planning needs to go in to this.

One potential idea would be to use the MAGIC string in the header to also encode a version. For example if we turn BORG_IDX to BORG_I and 2 bytes for versioning, we could determine if this version of the index is fully robin-hood compliant and if not we could convert it on load from disk.

@ThomasWaldmann I'd like to hear your thoughts on this.

we could bail after at most max_offset iterations.

Actually if the offset of the key is smaller than the number of buckets we've looked at we can bail. There's no way the next bucket will contain out desired key, since it would have been swapped on insert.

scanning a big part of hashindex sounds evil. guess at 95% fill rate, we would run into having to always scan about 20% of all buckets.

maybe that was the real reason for the perf breakdown i was sometimes seeing?

maybe first keep the code compatible until it is accepted / merged, but we'll keep the idea for later.

Well, one way to keep it compatible and still speedup hashindex_lookup is to always reinsert all items when loading from disk (no more expensive than a resize). I'll do some measurements of performance with and without this implemented.

* correctly track offset when stealing the location of existing bucket on collision handling * extract function to swap memory locations * avoid malloc/free on each call to hashindex_set * fixed many typos

ThomasWaldmann · 2017-01-28T03:33:51Z

Just as a vague idea:

It's not like borg is generally slow due to the current hashtable, just sometimes.

So, is it maybe the table resize? That quadratic behaviour described above is due to inserting into the new table in (old) hashtable order. So this affects both the currently used simple hashtable with linear probing as well as robin hood with linear probing.

If that would be true, we maybe would not have a problem at all with the hash table's normal performance being slower - if just the desastrous quadratic case is solved.

Final thoughts: we don't have 2^N table sizes (but prime sizes), so maybe the quadratic issue does not apply. Also, when growing, we do not start with a small new table (would be stupid anyway), we immediately go to a bigger size than the original table. So, the issue rust had with resizing seems irrelevant to us.

enkore · 2017-01-28T09:37:01Z

I'd say that the actual HT performance is a very minor thing (<1 %), when it works, but as we have observed a couple times there are degenerate cases where >9x % are spent working on the HT. I still think this may be tombstoning issues where lookups turn into long sequential scans. This part was I believe the original motivation for looking for improvement of the HT, the regular, normal performance of it.

Regarding HT use: The repository index has 100 % hit rate. When a lookup misses that's an error, it doesn't happen regularly. The cache is consulted for every chunk and is also updated for every hit (to increase refcount). But the frequency which which that happens is bound to IO and other CPU-intensive tasks, so a general slowdown - probably even by 50 % - wouldn't be noticeable, if it can reliably avoid the degenerate case(s?). When profiling Borg you really have to zoom in to even have the cache set/get show up. [Somewhat unsurprisingly: If you do IO and crypto over IO, then a simple HT lookup is always tiny in comparison]

ThomasWaldmann · 2017-01-28T13:11:19Z

@enkore the backshift deletion does not need tombstones, so would solve that potential issue.

there are two cases of a bit more agressive HT use:

when borg decides that a file has not changed (mtime/inode/size same), it takes the chunk id list from the files cache and verifies that they are all still present in the repo (by looking them up in the hashindex, which is assumed to be correct). So a potentially long list of hashids gets looked up in a burst with no I/O. As for many backups most files don't change, this is the usual case.
when the chunkindex gets resynced, hashtables get merged

rciorba · 2017-01-30T12:00:52Z

@enkore The current HT implementation of hashindex_lookup, when it finds a key, will move said key over the first tombstone it encountered during the scan. This means the impact of tombstones is not that big.

A potentially worst case scenario would be deleting a large number of keys would, for a while, make every get also perform a memory swap.

Also, while thinking about an answer to your comment I realized my other PR did something stupid with tombstones. So thanks for your comment and here's a PR to fix that issue: #2116

enkore · 2017-01-30T12:25:47Z

@enkore The current HT implementation of hashindex_lookup, when it finds a key, will move said key over the first tombstone it encountered during the scan. This means the impact of tombstones is not that big.

But doesn't that leave another tombstone in the place where the key was found (since further items with the same key could be located beyond the found item the bucket-chain can't be broken)?

rciorba · 2017-01-30T13:31:17Z

@enkore You're right, that does leave another tombstone, but it's hard to reason about what the impact of them is. The only really bad case I can imagine is having the hash at 75% fill rate, but having the 25% free space be mostly tombstones. And I'm not sure if that's likely to happen, since inserting keys would replace the tombstones with real values.

My reasoning about this:

say we're at 75% fill, no tombstones,
delete half the keys 37.5 % fill, 37.5% tombstones (performance is no worse than at 75% fill rate, after the swaps have settled down)
insert back up all the way to 75%, X% tombstones, the inserts will have replaces a bunch of them

How about we add a simple tracking of tombstones on an experimental 'telemetry' branch, then we get some actual numbers (basically find out what X is)?

Also, an eager delete could be implemented for the current master branch as well. I'll have a crack at it tonight/tomorrow if time allows.

enkore · 2017-01-30T15:32:04Z

Yes, my reasoning so far always has been in the direction of tombstones creating very long bucket chains that make look-ups slow, since that would correlate well with the "worst case" issues observed.

How about we add a simple tracking of tombstones on an experimental 'telemetry' branch, then we get some actual numbers (basically find out what X is)?

Sounds like a good idea. See also #1429 (comment) and #1429 (comment)

Perhaps that would be somewhat efficient at combating the supposed problem as well (resize / refill table if found_bucket - supposed_bucket > some n).

ThomasWaldmann · 2017-01-30T19:49:04Z

a rebase would be useful so this can be practically tested.

pczarn · 2017-01-31T20:15:36Z

Having tombstones in hash maps with robin hood hashing makes even less sense than in other hash maps.

I suggest a load factor between 80% and 90%. The difference between 90% and 93% may seem small, but having 7% free space instead of 10% is a big deal. A similar effect slows down hard drive file systems that are almost full. For a discussion about decreasing load factor, see rust-lang/rust#38003

You may want to remove the use of distance from hashindex_lookup, since that conditional break optimizes cases where the lookup fails, and you expect a hit rate of 100% in your application. Keep in mind that hashindex_lookup is used in hashindex_set. (In Rust's hashmap, we don't restart the search like in this PR. Whenever a new element is inserted, we continue at the offset from the failed initial lookup. However, I think the difference is tiny.)

Are keys influenced by foreign input? I'm not sure what is the purpose of this implementation.

pczarn · 2017-01-31T20:34:57Z

src/borg/_hashindex.c

        while(!BUCKET_IS_EMPTY(index, idx) && !BUCKET_IS_DELETED(index, idx)) {
-            idx = (idx + 1) % index->num_buckets;
+            /* we have a collision */


This loop skips over some number of entries, then continues by displacing other entries. Our inserted entry will take the bucket of the first displaced entry. Instead, you can forward-shift all buckets from the first displaced entry until the end of the chunk, using memmove. By increasing the displacement of all these buckets by 1, you keep the invariant of robin hood hashing, which relies on comparing displacements, not on their absolute value (other_offset < offset).

Tell me if something in my explanation is unclear. I'm using different names. My word for "distance" is "displacement".

So basically take the block (the entire contiguous section of buckets up to the first empty bucket) and memmove it forward one address, then insert at the ideal location. Oh my, yes, such an elegant solution! Thanks!

not on their absolute value (other_offset < offset).

Not sure what you mean by absolute value. The offsets are the relative distance from the ideal bucket a key would be in vs which bucket it is now, so I think it's the same as what you call "displacement".

Tell me if something in my explanation is unclear. I'm using different names. My word for "distance" is "displacement".

Indeed, displacement is a better name for it.

Thanks for the feedback!

rciorba · 2017-01-31T20:53:00Z

Thanks for the feedback!

Having tombstones in hash maps with robin hood hashing makes even less sense than in other hash maps.

There are no tombstones in this implementation. This PR also became a place to discuss the problems of the current implementation, hence the discussion about them.

I suggest a load factor between 80% and 90%. The difference between 90% and 93% may seem small, but having 7% free space instead of 10% is a big deal. A similar effect slows down hard drives. For a discussion about decreasing load factor, see rust-lang/rust#38003

Thanks. I'll check those out. I did benchmark at different load levels, so the performance impact should be visible there as well.

https://cdn.rawgit.com/rciorba/e3eded290e91a361f74eefdbc84416bb/raw/076524eb4a68697deafb7c1217113e65b526ebff/results.html

You may want to remove the use of distance from hashindex_lookup, since that conditional break optimizes cases where the lookup fails, and you expect a hit rate of 100% in your application.

I'll benchmark without the distance as well, but the periodic run of distance (every 16 or every 8 buckets) seems to have a small enough impact to be worth it. (rh8 and rh16 in the linked graphs).

Keep in mind that hashindex_lookup is used in hashindex_set. (In Rust's hashmap, we don't restart the search like in this PR. Whenever a new element is inserted, we continue at the offset from the failed initial lookup. However, I think the difference is tiny.)

This PR adds a pointer param so we can 'return' not just the found index but also the address to start the search in case of a miss, so in that respect it should behave the same as Rust's hashmap.

Are keys influenced by foreign input?

I don't understand your question, could you please elaborate?

I'm not sure what is the purpose of this implementation.

I believe the end goal was to use less memory by allowing larger fill rates. Implementing RH was mentioned at a EuroPython sprint for volunteers to pick up, and it sounded interesting, but this is my first contact with their code base. Since I don't know the project that well, I'm not sure what the usage patterns of the hashmap are. I was hoping the maintainers of borg could help guide me with that, hence the multiple variations of the implementation being bench-marked to see what the tradeoffs are for different use cases (like the distance short-cutting in lookup and it's impact on getitem whit no misses, vs getitem with misses).

enkore · 2017-01-31T22:18:07Z

Are keys influenced by foreign input? I'm not sure what is the purpose of this implementation.

Keys are the first few bits of a (usually keyed) cryptographic hash of input data. In the unkeyed case keys can thus be influenced to some extent; in the keyed case (used whenever the data is stored encrypted) this isn't possible at all.

I'd say that for all intents and purposes the keys can be seen as uniformly distributed values across the range.

rciorba · 2017-02-01T12:56:03Z

@enkore That makes sense. Thank you!

Insert a few keys, delete some of them, check we still have the values we expect, check the deleted ones aren't there.

Because I was calling hashindex_init with different capacities (based on desired fill rate), the actual fill rate didn't match expected.

setup would always finish by returning a new empty index, regardless of what was prepared.

rciorba · 2017-02-27T23:13:42Z

Closing this PR and starting a fresh one since this one has grown to massive proportions and many of the intermediate changesets are no longer relevant to the final form.

rciorba added 2 commits August 2, 2016 21:11

PlasmaPower reviewed Aug 2, 2016
View reviewed changes

ThomasWaldmann reviewed Aug 2, 2016
View reviewed changes

rciorba force-pushed the robin_hood branch from 2277c96 to fcf9e50 Compare August 4, 2016 21:31

rciorba reviewed Aug 4, 2016
View reviewed changes

cleanup + fix offset tracking on swap

60f7501

* correctly track offset when stealing the location of existing bucket on collision handling * extract function to swap memory locations * avoid malloc/free on each call to hashindex_set * fixed many typos

rciorba force-pushed the robin_hood branch from fcf9e50 to 60f7501 Compare August 4, 2016 22:34

This was referenced Aug 12, 2016

Alternative fuse mount method on OpenVZ #1466

Closed

hashtable local performance breakdowns #536

Closed

hashtable insert / lookup stats #506

Closed

update capacity based on new HASH_MAX_LOAD

1387b2d

ThomasWaldmann modified the milestones: 1.1rc1, 1.1.0b2 Aug 27, 2016

pczarn reviewed Jan 31, 2017

View reviewed changes

rciorba added 14 commits February 16, 2017 20:08

replace multiple memswap calls with single call to memmove

a8b528a

add extra test for the hashindex

10314aa

Insert a few keys, delete some of them, check we still have the values we expect, check the deleted ones aren't there.

init the hashindex with same capacity in c benchmarks

b66be6a

Because I was calling hashindex_init with different capacities (based on desired fill rate), the actual fill rate didn't match expected.

don't leak memory with the entry_to_insert+tmp_entry

4d8ab86

fix bad skiphint in hashindex_lookup

340a624

fix bad setup in c benchmarks

3f3ab2d

setup would always finish by returning a new empty index, regardless of what was prepared.

delete commented out code in benchmark.py

e5092bc

run missing key lookup shortcut every 64 buckets

91bec5e

fix setitem benchmark to use an empty index

4ab6309

implement shifting for deletion

a968a62

wrap up delete memmove

a0c000d

remove a bunch of debug code

4d1303a

get rid of unused index->entry_to_insert

36bdbad

cleanup the benchmark code

78be711

rciorba closed this Feb 27, 2017

rciorba mentioned this pull request Feb 27, 2017

robin hood hashing #2230

Closed

		@@ -111,7 +111,7 @@ hashindex_index(HashIndex index, const void key)
		static int
		hashindex_lookup(HashIndex index, const void key)

Uh oh!

WIP: Robin Hood hashing #1429

WIP: Robin Hood hashing #1429

Uh oh!

Conversation

rciorba commented Aug 2, 2016

Uh oh!

PlasmaPower Aug 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rciorba commented Aug 2, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann commented Aug 2, 2016

Uh oh!

ThomasWaldmann commented Aug 2, 2016

Uh oh!

ThomasWaldmann commented Aug 4, 2016

Uh oh!

rciorba commented Aug 4, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rciorba Aug 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann commented Jan 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enkore commented Jan 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasWaldmann commented Jan 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rciorba commented Jan 30, 2017

Uh oh!

enkore commented Jan 30, 2017

Uh oh!

rciorba commented Jan 30, 2017

Uh oh!

enkore commented Jan 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasWaldmann commented Jan 30, 2017

Uh oh!

pczarn commented Jan 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rciorba commented Jan 31, 2017

Uh oh!

enkore commented Jan 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rciorba commented Feb 1, 2017

Uh oh!

rciorba commented Feb 27, 2017

Uh oh!

Uh oh!

PlasmaPower Aug 2, 2016 •

edited

Loading

rciorba Aug 4, 2016 •

edited

Loading

ThomasWaldmann commented Jan 28, 2017 •

edited

Loading

enkore commented Jan 28, 2017 •

edited

Loading

ThomasWaldmann commented Jan 28, 2017 •

edited

Loading

enkore commented Jan 30, 2017 •

edited

Loading

pczarn commented Jan 31, 2017 •

edited

Loading

enkore commented Jan 31, 2017 •

edited

Loading