-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bmqt::MessageGUID: use mxm bit mixer as a fast hash #348
Conversation
2ea23f3
to
37a4777
Compare
Where does this new hash algorithm come from? |
@hallfox we use mx3 bit mixer on GUID buffer, more info here: https://github.com/jonmaiga/mx3?tab=readme-ov-file#mx3mix I also plan to paste some links in the code, if we decide to keep this change Typically bit mixers are used as the last step of computing hash in general case, but for our case it's more than enough to use it on its own. On top of the mixer, we combine hashes similar to the boost implementation, but we use https://www.boost.org/doc/libs/1_55_0/doc/html/hash/reference.html#boost.hash_combine https://github.com/search?q=0x517cc1b727220a95&type=code |
8cdfdc1
to
2e4e283
Compare
42b52e5
to
e536189
Compare
5b1630f
to
f482c99
Compare
f482c99
to
9f6784f
Compare
Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>
9f6784f
to
ae20034
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. The results look great!
Note: I'm fine with leaving the utilities to benchmark and compare these in place for now. We can always come back and remove anything we don't want later.
Comparison to To benchmark, modify the following source file: Add these functions: size_t mix(size_t x)
{
x *= 0xbf58476d1ce4e5b9ULL;
x ^= x >> 56;
x *= 0x94d049bb133111ebULL;
return x;
}
size_t combine(size_t lhs, size_t rhs)
{
lhs ^= rhs + 0x517cc1b727220a95 + (lhs << 6) + (lhs >> 2);
return lhs;
}
size_t mxm(const void* src) {
const size_t* start = (size_t*)src;
return combine(mix(start[0]), mix(start[1]));
}
size_t mxm_wrapper(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* customPayload)
{
(void)srcSize; (void)dst; (void)dstCapacity; (void)customPayload;
return mxm(src);
} And modify this code block: #ifndef HARDWARE_SUPPORT
# define NB_HASHES 5
#else
# define NB_HASHES 5
#endif
Bench_Entry const hashCandidates[NB_HASHES] = {
{ "mxm" , mxm_wrapper },
{ "xxh3" , xxh3_wrapper },
{ "XXH32" , XXH32_wrapper },
{ "XXH64" , XXH64_wrapper },
{ "XXH128", XXH128_wrapper },
#ifdef HARDWARE_SUPPORT
/* list here codecs which require specific hardware support, such SSE4.1, PCLMUL, AVX2, etc. */
#endif
}; https://github.com/Cyan4973/xxHash/blob/release/tests/bench/main.c The results with
|
Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>
Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>
Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>
This PR replaces our
djb2
hash implementation (from 2016) with more efficient bit mixer.The new hash generator (it inherits the name custom) has less collisions and also works ~2 times faster than the previous one (named legacy(djb2)).
Algos overview
baseline
- hash with near-ideal performance but with awful hash quality. Basically it just returnsxor
of the data, so it touches every byte with the minimum number of operations. We list this hash as a reference point, to see how far are we from the ideal scenario.default
-bslh::DefaultHashAlgorithm
, general purpose hashing algorithm from BDE. It doesn't use the prior information about our data size, so it's slow compared to custom ones. Shows no collisions in the scope of our tests.legacy(djb2)
- previously used custom hash.mxm
- small 64bit mixer. Doesn't guarantee the best quality hash among bit mixers, but works very fast. The hash quality is still more than enough for our data, since we don't detect collisions in the scope of the tests.More info and comparison to other ones: https://jonkagstrom.com/bit-mixer-construction/index.html
Note that from the comparison it's a local optimum: best quality among small-sized bit mixers.
mx3
- big 64bit mixer. Has a better hash quality thanmxm
, but also has more instructions, so it's 2 times slower thanmxm
on our data size. Still, it's still faster than thedefault
andlegacy(djb2)
.More info:
https://github.com/jonmaiga/mx3?tab=readme-ov-file#mx3mix
https://jonkagstrom.com/mx3/index.html
https://jonkagstrom.com/mx3/mx3_rev2.html
Collisions
Distributions
Several different ways to generate GUIDs were tested. The most important ones are
bmqp_1
andbmqp_N
, they show how we generate GUIDs in the application. However, some other distributions were added to understand better if the hash function is doing fine.Results
10kk samples
Different columns - different distributions:
Note a few distributions where our legacy cache fails completely:
Avalanche
Ideally, a good hash function should turn ~50% of the resulting hash bits if you change one of the data bits. This probability should not depend on the position of the bit in the input data.
It's common to build flip probability tables to show the avalanche effect of the given algo. This table is two dimensional, and the the value
table[i][j]
shows what is the probability of thej
-th bit flipping, if we change only thei
-th bit in the input data.I tested the avalanche effect on
djb2
andmx3
implementations and prepared visualizations of flip probability tables. The bright green color means that the flip probability is close to 0.5 (which is good), the red color means either close to 0.0 or 1.0 (which are equally bad). I also rescaled the visualization to make it more easy to understand.djb2 (bad avalanche effect):
mxm (not ideal, but still good avalanche effect)
mx3rev2 (near-perfect avalanche effect):
Standalone performance
Performance of the proposed algorithms in isolated benchmark.
Mac M2
Debug
Release
GNU/Linux host 3GHz amd64
Debug
Release
Note that
djb2
is actually slower now than thedefault
hash on Release.GNU/Linux VM 2.3GHz amd64
Debug
Release
Note that
djb2
is actually slower now than thedefault
hash on Release here too.Cluster performance testing
Performance of the BlazingMQ cluster using bmqbrkr.tsk/bmqtool.tsk built with the corresponding hashing function. The cluster has a fixed 3-node topology, with the same leader node on every test and with the same connections from clients, all other parameters are also the same except binaries used.
Produce rate 110k msgs/s, 3 minutes
djb2: 922.9 ms median latency
mxm: 3.2 ms median latency
Overall, the implementation using mxm hash is able to keep up with message bursts better.
Notes
See the comments in the files diff for more details.
As a result, we should be able to insert
bmqt::MessageGUID
to hash tables across BlazingMQ faster.