Skip to content

Hash Containers

Felix Gündling edited this page Oct 19, 2019 · 1 revision

Cista provides a custom, serializable hash map and hash set implementation. The hash map implementation was chosen in a way that allows for in the application. Not just for serialization.

Design Choice

The implementation in the C++ standard library is slow (mainly because of the interface required by the standard):

Google's "Swiss Table" data structure performs very well in all benchmarks. For those who are interested, these presentations of Matt Kulukundis are very enlightening if you want to understand how it works:

Therefore, Cista's hash table is based on the Swiss Table idea. Thus, Cista's hash data structures do not provide pointer/iterator stability.

However, to keep it simple, Cista's data structure does not have support for the following features that are included in Google's Abseil CPP library:

  • SSE lookup: Cista uses a group size of 8 bytes instead of 16 bytes. Interestingly, this does not impair the performance very much: benchmark results Cista vs. Abseil hash map.
  • Extra sanitizer support (marking / unmarking memory regions)
  • Allocator support
  • The interface roughly follows the interface of std::unordered_map<K, V> but it is by all means not complete.

Feel free to add support for any of those features (GitHub pull request).

Continuous Verification

The correctness of the implementation was tested by integrating the hash map into real world applications and by using fuzzing through Clang's LibFuzzer. The code performs random insertions and deletions and checks that the hash map contents match those of a std::unordered_map.

Integration with Cista's Hashing and Equality Frameworks

Consider a std::map<std::string, int> where we want to find() a character sequence we have as a std::string_view. In this case, a std::string will be constructed from the std::string_view.

However, this is not necessary at all: for find(), only hashing and equality checks are required. Since all string-like types are equality comparable and yield the same hash, it is possible to find() a AnyStringLikeType in a cista::hash_map<AnyStringLikeType, X>. This works through the is_hash_equivalent indicator of cista::hashing and variable type interface of cista::equal_to<T>::operator(T1).

This way, it is possible to save a copy on each find call.