-
Notifications
You must be signed in to change notification settings - Fork 121
Hash Containers
Cista provides a custom, serializable hash map and hash set implementation. The hash map implementation was chosen in a way that allows for in the application. Not just for serialization.
The implementation in the C++ standard library is slow (mainly because of the interface required by the standard):
Google's "Swiss Table" data structure performs very well in all benchmarks. For those who are interested, these presentations of Matt Kulukundis are very enlightening if you want to understand how it works:
- CppCon 2017: "Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step": https://www.youtube.com/watch?v=ncHmEUmJZf4
- CppCon 2019: "Abseil's Open Source Hashtables: 2 Years In": https://www.youtube.com/watch?v=JZE3_0qvrMg
Therefore, Cista's hash table is based on the Swiss Table idea. Thus, Cista's hash data structures do not provide pointer/iterator stability.
However, to keep it simple, Cista's data structure does not have support for the following features that are included in Google's Abseil CPP library:
- SSE lookup: Cista uses a group size of 8 bytes instead of 16 bytes. Interestingly, this does not impair the performance very much: benchmark results Cista vs. Abseil hash map.
- Extra sanitizer support (marking / unmarking memory regions)
- Allocator support
- The interface roughly follows the interface of
std::unordered_map<K, V>
but it is by all means not complete.
Feel free to add support for any of those features (GitHub pull request).
The correctness of the implementation was tested by integrating the hash map into real world applications and by using fuzzing through Clang's LibFuzzer. The code performs random insertions and deletions and checks that the hash map contents match those of a std::unordered_map
.
Consider a std::map<std::string, int>
where we want to find()
a character sequence we have as a std::string_view
. In this case, a std::string
will be constructed from the std::string_view
.
However, this is not necessary at all: for find()
, only hashing and equality checks are required. Since all string-like types are equality comparable and yield the same hash, it is possible to find()
a AnyStringLikeType
in a cista::hash_map<AnyStringLikeType, X>
. This works through the is_hash_equivalent
indicator of cista::hashing
and variable type interface of cista::equal_to<T>::operator(T1)
.
This way, it is possible to save a copy on each find
call.