bmqt::MessageGUID: use mxm bit mixer as a fast hash #348

678098 · 2024-07-02T03:24:32Z

This PR replaces our djb2 hash implementation (from 2016) with more efficient bit mixer.

The new hash generator (it inherits the name custom) has less collisions and also works ~2 times faster than the previous one (named legacy(djb2)).

Algos overview

baseline - hash with near-ideal performance but with awful hash quality. Basically it just returns xor of the data, so it touches every byte with the minimum number of operations. We list this hash as a reference point, to see how far are we from the ideal scenario.

default - bslh::DefaultHashAlgorithm, general purpose hashing algorithm from BDE. It doesn't use the prior information about our data size, so it's slow compared to custom ones. Shows no collisions in the scope of our tests.

legacy(djb2) - previously used custom hash.

mxm - small 64bit mixer. Doesn't guarantee the best quality hash among bit mixers, but works very fast. The hash quality is still more than enough for our data, since we don't detect collisions in the scope of the tests.
More info and comparison to other ones: https://jonkagstrom.com/bit-mixer-construction/index.html
Note that from the comparison it's a local optimum: best quality among small-sized bit mixers.

mx3 - big 64bit mixer. Has a better hash quality than mxm, but also has more instructions, so it's 2 times slower than mxm on our data size. Still, it's still faster than the default and legacy(djb2).
More info:
https://github.com/jonmaiga/mx3?tab=readme-ov-file#mx3mix
https://jonkagstrom.com/mx3/index.html
https://jonkagstrom.com/mx3/mx3_rev2.html

Collisions

Distributions

Several different ways to generate GUIDs were tested. The most important ones are bmqp_1 and bmqp_N, they show how we generate GUIDs in the application. However, some other distributions were added to understand better if the hash function is doing fine.

Distribution <bmqp_1>:
One bmqp::MessageGUIDGenerator to generate all GUIDs
Sample: 4C4B4000000004649D34911A24C74DEA

Distribution <bmqp_N>:
Multiple different bmqp::MessageGUIDGenerator-s to generate all GUIDs
Sample: 47A12000000003C162BFFBAA8D84E12C

Distribution <rand>:
Init every uint8_t of GUID as 'rand() % 256':
uint8_t[0 .. 15] <- rand() % 256
Sample: 678A30EDD4E25654C71ADD05CEA1005F

Distribution <4counters>:
Init every uint32_t block of GUID as 'counter':
uint32_t[0..3] <- counter, after: counter++
Sample: 404B4C00404B4C00404B4C00404B4C00

Distribution <4quarters>:
Init every int32_t block of GUID as the same 'rand()' value:
val <- rand(), int32_t[0..3] <- val
Sample: FE3C9271FE3C9271FE3C9271FE3C9271

Distribution <2halves>:
Init the first half of GUID as 'rand() % 256' for every uint8_t, then
copy this memory chunk to the second half
Sample: AFB928A19AE59057AFB928A19AE59057

Distribution <counter>:
Init the uint32_t block of GUID as 'counter', set all other to 0:
uint32_t[0] <- counter++, uint32_t[1..3] <- 0
Sample: 404B4C00000000000000000000000000

Results

10kk samples
Different columns - different distributions:
Note a few distributions where our legacy cache fails completely:

./src/groups/bmq/bmqp/bmqp_messageguidgenerator.t.cpp CASE -10
=================================================================================
       ideal |      1 |      0 |    0 |   9999999 |   9999999 | 9999999 |       0
     default |      0 |      0 |    0 |         0 |         0 |       0 |       0
legacy(djb2) |    413 |   6016 |    0 |   9713662 |   3883193 |       0 | 9713662
         mxm |      0 |      0 |    0 |         0 |         0 |       0 |       0
     mx3rev2 |      0 |      0 |    0 |         0 |         0 |       0 |       0

Avalanche

Ideally, a good hash function should turn ~50% of the resulting hash bits if you change one of the data bits. This probability should not depend on the position of the bit in the input data.

It's common to build flip probability tables to show the avalanche effect of the given algo. This table is two dimensional, and the the value table[i][j] shows what is the probability of the j-th bit flipping, if we change only the i-th bit in the input data.

I tested the avalanche effect on djb2 and mx3 implementations and prepared visualizations of flip probability tables. The bright green color means that the flip probability is close to 0.5 (which is good), the red color means either close to 0.0 or 1.0 (which are equally bad). I also rescaled the visualization to make it more easy to understand.

djb2 (bad avalanche effect):

mxm (not ideal, but still good avalanche effect)

mx3rev2 (near-perfect avalanche effect):

Standalone performance

Performance of the proposed algorithms in isolated benchmark.

Mac M2

Debug

./src/groups/bmq/bmqp/bmqp_messageguidgenerator.t.cpp CASE -9
        Name |     Iters | Total time (ns) | Per hash (ns) | Hash rate (1/sec)
==============================================================================
    baseline | 100000000 |     819,400,000 |             8 |       122,040,517
     default | 100000000 |   4,456,562,666 |            44 |        22,438,818
legacy(djb2) | 100000000 |   3,050,444,792 |            30 |        32,782,104
         mx3 | 100000000 |   1,767,919,917 |            17 |        56,563,648
         mxm | 100000000 |   1,373,043,000 |            13 |        72,830,931

Release

        Name |     Iters | Total time (ns) | Per hash (ns) | Hash rate (1/sec)
==============================================================================
    baseline | 100000000 |      47,482,167 |             0 |     2,106,053,837
     default | 100000000 |     548,949,417 |             5 |       182,166,146
legacy(djb2) | 100000000 |     334,482,458 |             3 |       298,969,340
         mx3 | 100000000 |     181,780,875 |             1 |       550,112,876
         mxm | 100000000 |     102,296,750 |             1 |       977,548,162

GNU/Linux host 3GHz amd64

Debug

        Name |     Iters | Total time (ns) | Per hash (ns) | Hash rate (1/sec)
==============================================================================
    baseline | 100000000 |   2,918,770,493 |            29 |       34,261,001
     default | 100000000 |   6,838,175,060 |            68 |       14,623,784
legacy(djb2) | 100000000 |   6,040,766,851 |            60 |       16,554,189
         mx3 | 100000000 |   4,481,604,629 |            44 |       22,313,436
         mxm | 100000000 |   3,823,790,445 |            38 |       26,152,060

Release

        Name |     Iters | Total time (ns) | Per hash (ns) | Hash rate (1/sec)
==============================================================================
    baseline | 100000000 |      42,213,067 |             0 |     2,368,934,718
     default | 100000000 |     234,166,914 |             2 |       427,045,812
legacy(djb2) | 100000000 |     862,918,148 |             8 |       115,885,846
         mx3 | 100000000 |     418,625,713 |             4 |       238,876,869
         mxm | 100000000 |     202,597,088 |             2 |       493,590,510

Note that djb2 is actually slower now than the default hash on Release.

GNU/Linux VM 2.3GHz amd64

Debug

        Name |     Iters | Total time (ns) | Per hash (ns) | Hash rate (1/sec)
==============================================================================
    baseline | 100000000 |   3,098,477,609 |            30 |        32,273,914
     default | 100000000 |   6,826,186,922 |            68 |        14,649,466
legacy(djb2) | 100000000 |   6,369,171,947 |            63 |        15,700,628
         mx3 | 100000000 |   4,364,924,673 |            43 |        22,909,902
         mxm | 100000000 |   3,805,946,936 |            38 |        26,274,670

Release

        Name |     Iters | Total time (ns) | Per hash (ns) | Hash rate (1/sec)
==============================================================================
    baseline | 100000000 |      71,615,506 |             0 |     1,396,345,646
     default | 100000000 |     607,944,888 |             6 |       164,488,594
legacy(djb2) | 100000000 |     827,554,504 |             8 |       120,837,962
         mx3 | 100000000 |     407,979,098 |             4 |       245,110,596
         mxm | 100000000 |     230,836,655 |             2 |       433,206,762

Note that djb2 is actually slower now than the default hash on Release here too.

Cluster performance testing

Performance of the BlazingMQ cluster using bmqbrkr.tsk/bmqtool.tsk built with the corresponding hashing function. The cluster has a fixed 3-node topology, with the same leader node on every test and with the same connections from clients, all other parameters are also the same except binaries used.

Produce rate 110k msgs/s, 3 minutes
djb2: 922.9 ms median latency
mxm: 3.2 ms median latency

Overall, the implementation using mxm hash is able to keep up with message bursts better.

Notes

See the comments in the files diff for more details.

As a result, we should be able to insert bmqt::MessageGUID to hash tables across BlazingMQ faster.

hallfox · 2024-07-05T18:07:43Z

Where does this new hash algorithm come from?

678098 · 2024-07-05T18:29:33Z

Where does this new hash algorithm come from?

@hallfox we use mx3 bit mixer on GUID buffer, more info here:

https://github.com/jonmaiga/mx3?tab=readme-ov-file#mx3mix
https://jonkagstrom.com/mx3/index.html
https://jonkagstrom.com/mx3/mx3_rev2.html

I also plan to paste some links in the code, if we decide to keep this change

Typically bit mixers are used as the last step of computing hash in general case, but for our case it's more than enough to use it on its own.

On top of the mixer, we combine hashes similar to the boost implementation, but we use uint64 constant instead of uint32 one. I think it's widely used:

https://www.boost.org/doc/libs/1_55_0/doc/html/hash/reference.html#boost.hash_combine

https://github.com/search?q=0x517cc1b727220a95&type=code

Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>

chrisbeard

lgtm. The results look great!

Note: I'm fine with leaving the utilities to benchmark and compare these in place for now. We can always come back and remove anything we don't want later.

678098 · 2024-08-01T12:13:06Z

Comparison to xxHash using xxHash bench framework https://github.com/Cyan4973/xxHash/tree/release/tests/bench

To benchmark, modify the following source file:
https://github.com/Cyan4973/xxHash/blob/release/tests/bench/hashes.h

Add these functions:

size_t mix(size_t x)
{
    x *= 0xbf58476d1ce4e5b9ULL;
    x ^= x >> 56;
    x *= 0x94d049bb133111ebULL;
    return x;
}

size_t combine(size_t lhs, size_t rhs)
{
    lhs ^= rhs + 0x517cc1b727220a95 + (lhs << 6) + (lhs >> 2);
    return lhs;
}

size_t mxm(const void* src) {
    const size_t* start = (size_t*)src;
    return combine(mix(start[0]), mix(start[1]));
}

size_t mxm_wrapper(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* customPayload)
{
    (void)srcSize; (void)dst; (void)dstCapacity; (void)customPayload;
    return mxm(src);
}

And modify this code block:

#ifndef HARDWARE_SUPPORT
#  define NB_HASHES 5
#else
#  define NB_HASHES 5
#endif

Bench_Entry const hashCandidates[NB_HASHES] = {
    { "mxm"   , mxm_wrapper },
    { "xxh3"  , xxh3_wrapper },
    { "XXH32" , XXH32_wrapper },
    { "XXH64" , XXH64_wrapper },
    { "XXH128", XXH128_wrapper },
#ifdef HARDWARE_SUPPORT
    /* list here codecs which require specific hardware support, such SSE4.1, PCLMUL, AVX2, etc. */
#endif
};

https://github.com/Cyan4973/xxHash/blob/release/tests/bench/main.c
Here, comment bench_largeInput, bench_throughput_randomInputLength, bench_latency_randomInputLength since we test only on a fixed sized 16-byte input as sizeof(bmqt::MessageGUID)

The results with -O3 are:

./benchHash --mins=16 --maxs=16
 ===  benchmarking 5 hash functions  === 
Throughput small inputs of fixed size (from 16 to 16 bytes): 
mxm    , 602849812
xxh3   , 561013703
XXH32  , 330887477
XXH64  , 336326761
XXH128 , 414669854
Latency for small inputs of fixed size : 
mxm    , 125838350
xxh3   , 133699963
XXH32  ,  63934864
XXH64  ,  85749590
XXH128 , 108056586

Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>

678098 requested a review from a team as a code owner July 2, 2024 03:24

678098 force-pushed the 240630_messageguid_hasher branch 14 times, most recently from 2ea23f3 to 37a4777 Compare July 5, 2024 17:10

678098 force-pushed the 240630_messageguid_hasher branch 4 times, most recently from 8cdfdc1 to 2e4e283 Compare July 11, 2024 19:04

678098 force-pushed the 240630_messageguid_hasher branch 5 times, most recently from 42b52e5 to e536189 Compare July 26, 2024 17:56

678098 changed the title ~~[WIP]bmqt::MessageGUID: better hash generation~~ bmqt::MessageGUID: better hash generation Jul 26, 2024

678098 requested a review from chrisbeard July 29, 2024 13:45

678098 assigned chrisbeard Jul 29, 2024

678098 force-pushed the 240630_messageguid_hasher branch from 5b1630f to f482c99 Compare July 29, 2024 13:47

678098 force-pushed the 240630_messageguid_hasher branch from f482c99 to 9f6784f Compare July 29, 2024 13:51

bmqt::MessageGUID: better hash generation with fast bit mixer

ae20034

Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>

678098 force-pushed the 240630_messageguid_hasher branch from 9f6784f to ae20034 Compare July 29, 2024 14:05

Merge branch 'main' into 240630_messageguid_hasher

ea502da

678098 changed the title ~~bmqt::MessageGUID: better hash generation~~ bmqt::MessageGUID: use mxm bit mixer as a fast hash Jul 29, 2024

Merge branch 'main' into 240630_messageguid_hasher

7baf372

chrisbeard approved these changes Jul 31, 2024

View reviewed changes

678098 merged commit 536a3be into bloomberg:main Jul 31, 2024
29 checks passed

678098 deleted the 240630_messageguid_hasher branch July 31, 2024 18:42

alexander-e1off pushed a commit to alexander-e1off/blazingmq that referenced this pull request Oct 24, 2024

bmqt::MessageGUID: use mxm bit mixer as a fast hash (bloomberg#348)

17cae96

Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>

alexander-e1off pushed a commit to alexander-e1off/blazingmq that referenced this pull request Oct 24, 2024

bmqt::MessageGUID: use mxm bit mixer as a fast hash (bloomberg#348)

487a8e7

Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>

alexander-e1off pushed a commit to alexander-e1off/blazingmq that referenced this pull request Oct 24, 2024

bmqt::MessageGUID: use mxm bit mixer as a fast hash (bloomberg#348)

901d831

Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bmqt::MessageGUID: use mxm bit mixer as a fast hash #348

bmqt::MessageGUID: use mxm bit mixer as a fast hash #348

678098 commented Jul 2, 2024 •

edited

Loading

hallfox commented Jul 5, 2024

678098 commented Jul 5, 2024

chrisbeard left a comment

678098 commented Aug 1, 2024 •

edited

Loading

bmqt::MessageGUID: use mxm bit mixer as a fast hash #348

bmqt::MessageGUID: use mxm bit mixer as a fast hash #348

Conversation

678098 commented Jul 2, 2024 • edited Loading

Algos overview

Collisions

Distributions

Results

Avalanche

Standalone performance

Mac M2

GNU/Linux host 3GHz amd64

GNU/Linux VM 2.3GHz amd64

Cluster performance testing

Notes

hallfox commented Jul 5, 2024

678098 commented Jul 5, 2024

chrisbeard left a comment

Choose a reason for hiding this comment

678098 commented Aug 1, 2024 • edited Loading

678098 commented Jul 2, 2024 •

edited

Loading

678098 commented Aug 1, 2024 •

edited

Loading