Add faster HashMap implementation #5271

lth · 2019-04-30T20:41:45Z

This adds a new hash table implementation that is generally more memory friendly, and faster than HashMap or std::unordered_map. This replaces the global lock table, as well as the tracked_keys data structure.

On a single threaded workload where GetForUpdate + Put(assume_tracked) is called in batches of 100k keys:
std::unordered_map: 265691.8 / s
HashMapRB: 298957.6 / s

12.5% improvement

facebook-github-bot

@lth has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

adamretter · 2019-05-01T06:24:40Z

@lth I also just saw this announcement of F14 a new Open Source HashMap implementation from Facebook, perhaps it is of interest - https://code.fb.com/developer-tools/f14/

lth · 2019-05-01T17:22:55Z

@adamretter I spoke to @siying about this, since my first thought was just to use F14 as well, but it seems like it's hard to include into rocksdb without adding a lot of folly dependencies.

siying · 2019-05-01T20:52:33Z

@adamretter we have discussed about depending on folly many times, but so far it's still too complicated. Several factors in my mind:

platform supported. The community has ported RocksDB to platforms like Power, FreeBSD, Solaris, etc, while Folly has no long term support for them.
easy of build. Right now RocksDB requires no hard dependency to build. If you have a Linux, FreeBSD, etc, you can just grab the code and do make or cmake and it is done (if you need specific compression library, you can optionally install them). If we rely on folly, either a user has to choose in build time whether to rely on it or not, or we treat folly as a hard dependency. Both ways make it harder for users to build and run RocksDB.
RocksDB is GPLv2 and Apache dual-licence but folly is Apache. This will complicate users' consideration of adapting RocksDB or products built on RocksDB. Of course, we can work with our lawyers to try to re-license folly, so this is a relatively minor consideration.

So the decision so far is that we aren't going to depend on folly for now just because of this feature, and we may periodically revisit this decision.

adamretter · 2019-05-01T22:18:22Z

@siying Totally understand... and all are very good reasons! Thanks for the explanation :-)

facebook-github-bot · 2019-05-07T22:40:46Z

@lth has updated the pull request. Re-import the pull request

ltamasi

This is one cool hash table.

ltamasi · 2019-05-10T16:13:50Z

util/hash_map.h

+ public:
+  using key_type = K;
+  using mapped_type = V;
+  using value_type = std::pair<K, V>;


We should probably match the standard associative containers here and use std::pair<const K, V>.

ltamasi · 2019-05-10T16:27:17Z

util/hash_map.h

+// the 'hole'.
+//
+template <typename K, typename V, class Hash = std::hash<K>>
+class HashMapRB {


How about calling it HashMapRobinHood? RB immediately made me think of red-black trees.

ltamasi · 2019-05-10T16:32:08Z

util/hash_map.h

+// Robinhood hashing is used, where metadata about the distance between the
+// current slot and the desired slot is kept. On collisions during inserts, if
+// the occupying item's distance is smaller than the inserted item's distance,
+// then the inserted item takes over the slot, and the occupying item is


This means that the rules for invalidating iterators is different than those for std::unordered_map. We should make sure none of the code we're switching over to the new implementation relies on std::unordered_map's behavior.

ltamasi · 2019-05-10T16:46:06Z

util/hash_map.h

+    return ((1 << 7) | (offset << 3) | hashbits);
+  }
+
+  static constexpr uint8_t inc_dist(uint8_t x) { return x + (1 << 3); }


We could add an assert here to make sure we don't overflow the 4-bit field (and similarly add an assertion for underflow in dec_dist below).

ltamasi · 2019-05-10T16:51:43Z

util/hash_map.h

+  typedef iterator_impl<const HashMapRB, true> const_iterator;
+
+  // -- Iterator Operations
+  iterator begin() { return iterator(this, 0); }


We could consider adding cbegin/cend and empty as well to mimic the standard unordered_map.

ltamasi · 2019-05-10T17:02:12Z

util/hash_map.h

+      destroy();
+
+      memcpy(this, &other, sizeof(*this));
+      other.values_ = nullptr;


We should probably clear the other fields as well to bring the moved-from object to a valid empty state (same with the move ctor below). Or even call init(1 << 4) on it; that might be even better.

ltamasi · 2019-05-10T17:54:40Z

util/hash_map.h

+    // Rehash until we get a short distances. This could loop infinitely if we
+    // have a bad hash function.
+    while (true) {
+      pos = h & mask_;


Minor but this line seems superfluous considering pos is reinitialized to h & mask_ in the for loop below.

ltamasi · 2019-05-10T18:04:52Z

util/hash_map.h

+  }
+
+  ROCKSDB_FORCE_INLINE iterator find(const K& key) {
+    const_iterator it = const_cast<typename std::add_const<


I think one way around the constness problems here would be to move the actual find logic to a private helper that would return only an index, and then have two thin find wrappers around it (one const method that returns a const_iterator, and one non-const method that returns an iterator).

ltamasi · 2019-05-10T18:20:07Z

util/hash_map.h

+      assert(((pos + get_dist(info_[it.index_])) & mask_) == it.index_);
+
+      auto find_it = find(it->first);
+      assert(find_it != end());


assert(find_it == it) ?

facebook-github-bot · 2020-06-15T17:24:25Z

@lth has updated the pull request. Re-import the pull request

mrambacher · 2020-06-16T13:01:11Z

@siying @adamretter Is there a reason not to use Folly if it is available? Is there a reason not to introduce a compile-time flag that uses the Folly implementation if it is there and the RobinHood otherwise? Wouldn't this be similar to what is done with things like ROCKSDB_JEMALLOC and other flags?

I understand it would add another dimension to the overgrowing testing matrix and potentially complicate something like the Java distribution, but it seems like it might be nice to be able to take advantage of the Folly features where/when they are available.

adamretter · 2020-06-16T16:11:40Z

@mrambacher sounds reasonable to me, as long as Siying's concerns are met

facebook-github-bot · 2023-12-02T08:51:59Z

Hi @lth!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

lth requested review from maysamyabandeh and siying April 30, 2019 20:41

facebook-github-bot added the CLA Signed label Apr 30, 2019

lth force-pushed the hashmap branch from 15249c4 to dfc3e8e Compare April 30, 2019 21:10

facebook-github-bot reviewed Apr 30, 2019

View reviewed changes

lth force-pushed the hashmap branch from dfc3e8e to 2d901dd Compare May 7, 2019 22:40

ltamasi self-requested a review May 9, 2019 18:26

ltamasi reviewed May 10, 2019

View reviewed changes

maysamyabandeh removed their request for review January 3, 2020 21:33

init

84122fd

lth force-pushed the hashmap branch from 2d901dd to 84122fd Compare June 15, 2020 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add faster HashMap implementation #5271

Add faster HashMap implementation #5271

lth commented Apr 30, 2019

facebook-github-bot left a comment

adamretter commented May 1, 2019

lth commented May 1, 2019

siying commented May 1, 2019 •

edited

Loading

adamretter commented May 1, 2019

facebook-github-bot commented May 7, 2019

ltamasi left a comment

ltamasi May 10, 2019

ltamasi May 10, 2019

ltamasi May 10, 2019

ltamasi May 10, 2019

ltamasi May 10, 2019

ltamasi May 10, 2019

ltamasi May 10, 2019

ltamasi May 10, 2019

ltamasi May 10, 2019

facebook-github-bot commented Jun 15, 2020

mrambacher commented Jun 16, 2020

adamretter commented Jun 16, 2020

facebook-github-bot commented Dec 2, 2023

Add faster HashMap implementation #5271

Are you sure you want to change the base?

Add faster HashMap implementation #5271

Conversation

lth commented Apr 30, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

adamretter commented May 1, 2019

lth commented May 1, 2019

siying commented May 1, 2019 • edited Loading

adamretter commented May 1, 2019

facebook-github-bot commented May 7, 2019

ltamasi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 15, 2020

mrambacher commented Jun 16, 2020

adamretter commented Jun 16, 2020

facebook-github-bot commented Dec 2, 2023

Process

siying commented May 1, 2019 •

edited

Loading