Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add improved std.hash.int - deprecate std.hash.uint32 #21858

Merged
merged 5 commits into from
Nov 25, 2024

Conversation

francescoalemanno
Copy link
Contributor

@francescoalemanno francescoalemanno commented Oct 30, 2024

It turns out that Zig's default bit mixer is very biased, and after a lot of googling it turns out that selecting a better solution is hard...
I wrote a custom statistical analysis taylored for bit mixers in order to select the best one at each size (u64/u32/u16), compared a lot of mixers, and packaged the best ones in this PR.

TESTING CODE here

@RetroDev256
Copy link
Contributor

RetroDev256 commented Oct 30, 2024

Thank you for the research! Running both the hash functions in a "counter mode" construction, I tested them using PractRand:

As you can see, the current uint32 fails at 256 MiB:

RNG_test using PractRand version 0.95
RNG = RNG_stdin32, seed = unknown
test set = core, folding = standard (32 bit)

rng=RNG_stdin32, seed=unknown
length= 256 megabytes (2^28 bytes), time= 2.7 seconds
  Test Name                         Raw       Processed     Evaluation
  Gap-16:A                          R=+995.1  p =  3.4e-792   FAIL !!!!!!!   
  Gap-16:B                          R= +1139  p =  5.9e-924   FAIL !!!!!!!   
  FPF-14+6/16:(0,14-0)              R=  -7.6  p =1-1.1e-6   mildly suspicious
  [Low8/32]Gap-16:A                 R=  +4.9  p =  1.2e-3   unusual          
  [Low8/32]Gap-16:B                 R=  +5.3  p =  1.6e-4   unusual          
  [Low8/32]FPF-14+6/16:(0,14-0)     R= +21.1  p =  4.0e-19    FAIL !         
  [Low8/32]FPF-14+6/16:(1,14-0)     R= +21.4  p =  1.9e-19    FAIL !         
  [Low8/32]FPF-14+6/16:(2,14-1)     R= +15.4  p =  2.0e-13    FAIL           
  [Low8/32]FPF-14+6/16:(3,14-2)     R= +12.7  p =  6.3e-11   VERY SUSPICIOUS 
  [Low8/32]FPF-14+6/16:all          R= +34.9  p =  3.1e-32    FAIL !!!       
  [Low1/32]BCFN(2+2,13-6,T)         R= +10.6  p =  4.7e-4   unusual          
  ...and 157 test result(s) without anomalies

The 32 bit hash function in this PR fails at 1 GiB: (1/4 the state space!)

RNG_test using PractRand version 0.95
RNG = RNG_stdin32, seed = unknown
test set = core, folding = standard (32 bit)

rng=RNG_stdin32, seed=unknown
length= 256 megabytes (2^28 bytes), time= 3.1 seconds
  no anomalies in 168 test result(s)

rng=RNG_stdin32, seed=unknown
length= 512 megabytes (2^29 bytes), time= 7.1 seconds
  Test Name                         Raw       Processed     Evaluation
  FPF-14+6/16:all                   R=  -7.1  p =1-1.5e-6   suspicious       
  ...and 179 test result(s) without anomalies

rng=RNG_stdin32, seed=unknown
length= 1 gigabyte (2^30 bytes), time= 13.9 seconds
  Test Name                         Raw       Processed     Evaluation
  FPF-14+6/16:(0,14-0)              R=  -7.5  p =1-1.3e-6   mildly suspicious
  FPF-14+6/16:(1,14-0)              R=  -7.5  p =1-1.2e-6   mildly suspicious
  FPF-14+6/16:(2,14-0)              R=  -6.9  p =1-4.3e-6   unusual          
  FPF-14+6/16:all                   R= -14.7  p =1-5.5e-14    FAIL           
  ...and 190 test result(s) without anomalies

The statistical tests in the PractRand suite are very advanced (even compared to TestU01's BigCrush, which is limited in duration) - I feel this is good evidence we should switch to the new one in this PR.

While less common, I feel that the 64 and 16 bit hash functions would also be nice. One thought though for maintainers, would it be worth it to include a hashUsize function, that switches over @bitSizeOf(usize), selecting one of these functions?

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Can you please make the function accept the integer type as a parameter, and soft-deprecate the old function (make it keep working but documented as deprecated)?

Just switch on the integer type and put @compileError("unimplemented") for the unhandled cases. Or, feel free use the other std.hash abstractions to support those kind of integers.

pub fn int(input: anytype) @TypeOf(input) {
    switch (@TypeOf(input)) {
        u32 => { ... },
        u64 => { ... },

@francescoalemanno francescoalemanno changed the title add std.hash.uint16, std.hash.uint64, improve std.hash.uint32 add improved std.hash.int - deprecate std.hash.uint32 Oct 31, 2024
@francescoalemanno
Copy link
Contributor Author

I believe it is now ready to be merged :)

@andrewrk andrewrk enabled auto-merge November 1, 2024 19:05
@andrewrk andrewrk added standard library This issue involves writing Zig code for the standard library. release notes This PR should be mentioned in the release notes. labels Nov 1, 2024
@andrewrk
Copy link
Member

andrewrk commented Nov 1, 2024

I pushed 2 commits after yours and force pushed to the branch, and set to auto-merge.

Would you please look at the last commit that I added there, and consider that approach? Is there any concern about bias when using the upcasting and truncation strategy?

auto-merge was automatically disabled November 1, 2024 23:11

Head branch was pushed to by a user without write access

@andrewrk
Copy link
Member

andrewrk commented Nov 1, 2024

Alright this is getting a bit chaotic for me. I'd like to see performance benchmarks on this before it lands. It looks like significantly more instructions now for u32.

@francescoalemanno
Copy link
Contributor Author

Sorry Andrew, I meant well with my sudden modifications :)
I just found out about the ARX (add-rotate-xor) construction, and it seems to be the best tool to devise a general width bit mixer.
I tested it all day using my method revealing no particular flaws, It also performs much better under PractRand (as suggested by @RetroDev256) .

the excellent news is that it reaches 4 Gigs without failing:

➜  bit_mixers_analysis git:(main) ✗ zig run hash_tester.zig -OReleaseFast | RNG_test stdin32    
RNG_test using PractRand version 0.95
RNG = RNG_stdin32, seed = unknown
test set = core, folding = standard (32 bit)

rng=RNG_stdin32, seed=unknown
length= 16 megabytes (2^24 bytes), time= 2.1 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low8/32]BCFN(2+6,13-8,T)         R=  -4.8  p =1-7.8e-5   unusual          
  ...and 119 test result(s) without anomalies

rng=RNG_stdin32, seed=unknown
length= 32 megabytes (2^25 bytes), time= 4.2 seconds
  no anomalies in 131 test result(s)

rng=RNG_stdin32, seed=unknown
length= 64 megabytes (2^26 bytes), time= 8.5 seconds
  no anomalies in 142 test result(s)

rng=RNG_stdin32, seed=unknown
length= 128 megabytes (2^27 bytes), time= 16.7 seconds
  no anomalies in 156 test result(s)

rng=RNG_stdin32, seed=unknown
length= 256 megabytes (2^28 bytes), time= 33.2 seconds
  no anomalies in 168 test result(s)

rng=RNG_stdin32, seed=unknown
length= 512 megabytes (2^29 bytes), time= 65.9 seconds
  no anomalies in 180 test result(s)

rng=RNG_stdin32, seed=unknown
length= 1 gigabyte (2^30 bytes), time= 131 seconds
  no anomalies in 194 test result(s)

rng=RNG_stdin32, seed=unknown
length= 2 gigabytes (2^31 bytes), time= 257 seconds
  no anomalies in 205 test result(s)

rng=RNG_stdin32, seed=unknown
length= 4 gigabytes (2^32 bytes), time= 508 seconds
  no anomalies in 217 test result(s)

this in turn implies that the properties of this hash are much better, perhaps to an excessive amount...
If performance is to be preferred over quality, we can revert back to your commit.

@andrewrk
Copy link
Member

andrewrk commented Nov 1, 2024

Both are important factors

@francescoalemanno
Copy link
Contributor Author

francescoalemanno commented Nov 2, 2024

I do agree with that, the new construction is ~ 12 times slower, but it does not depends on a lot of magic and tuned constants which helps in writing generic code.
Now I am trying to minimise the ARX construction to get a good hash at reasonable speeds.

edit: brought it down to ~4 times slower without sacrificing quality. it passed 8gb practrand, so 4 times slower but 8 times more reliable w.r.t. to the previous version. I will look around for some more tricks to trim it down further.

edit v2: I realized that optimizing a non-crypto hash to pass practrand is unnecessary (sorry for the wasted time Andrew...), I found a good middle ground that fixes the statistical artifacts while keeping the hasher performant. Now it looks good to me @andrewrk.

@francescoalemanno
Copy link
Contributor Author

francescoalemanno commented Nov 3, 2024

finally all requirements for a good bit-mixer are met:

  • zero sensitivity: 0 is not a fixed point
  • diffusion: bias is below 1% at all width except the very
  • it is a bijection at all widths, meaning that every int is mapped to a distinct int, it can be used in hash-requiring datastructures without introducing extra collisions.
  • performance: it is ~1.5 times slower than the original hash in stdlib, but it works at any width and with ~60 times less bias.
- timing for hashing 2^32 integers.
time :      std.hash.uint32                 +386ms    (PRE-PR)
time :         std.math.int                 +599ms       (NEW)

@andrewrk, if there are other requirements before merging let me know, and thank you.

@andrewrk
Copy link
Member

andrewrk commented Nov 4, 2024

It looks like you've managed to reveal a bug in the C backend. The next step here will be extracting a reduced behavior test case from this code that triggers the bug and getting that fixed in master branch. Then we can come back and rebase this branch.

By the way, how are you measuring performance? Can you write that down somewhere in the PR description?

@francescoalemanno
Copy link
Contributor Author

francescoalemanno commented Nov 4, 2024

Oh okay! I do not have a Linux machine with me currently, is there any way I can help?

Benchmarking strategy
https://github.com/francescoalemanno/bit_mixers_analysis/blob/1cf848fc849489c0055976f436805aaf645d8d30/hash_tester.zig#L81

This benchmarking function times a given hashing function by running it on a range of values (up to 2^32−1), recording the execution time for each run, and keeping track of the best (fastest) observed time. It uses an accumulator, x, to avoid compiler optimizations. After five iterations, it returns the fastest recorded time in milliseconds.

@andrewrk
Copy link
Member

andrewrk commented Nov 4, 2024

Thanks!

If you want to get involved and learn about the C backend you can help out there, otherwise, sit tight and we'll get it fixed and then ping you when it's fixed.

@ianprime0509
Copy link
Contributor

I've reduced the C backend bug and opened it as #21914.

@francescoalemanno
Copy link
Contributor Author

francescoalemanno commented Nov 6, 2024

Since the C-backend bug blocked this PR, I figured there was some extra time to squeeze some more performance & quality. I devised some better metrics to evaluate the quality, and now I can see it is pareto optimal.

time :      std.hash.uint32               0.1997ns per hash      777566858
time :         std.math.int               0.2800ns per hash      49333260
new hash is 1.4022 times slower.
it has 329 times less bias (evaluated as sqrt(mse_1/mse_2)).

The enhanced quality of the hasher is also noticed in practice:
immagine
The new procedure performs like a random oracle, i.e. in this example it reaches theoretically optimal performance

@RetroDev256
Copy link
Contributor

RetroDev256 commented Nov 6, 2024

I have some bad news - Not completely sure the metrics that are being used, but I feel that the implementation is getting a bit "too smart" for it's original purpose. I won't say that I fully understand the testing, but I feel that we could still do better in terms of quality & speed.

I do think that there may have been some testing accuracy drift over the past few commits, because the current code for 32 bits fails at 1 GiB for me (and 512 MiB is "suspicious" according to PractRand).

I would like to propose a less "autogenerated" hash function. I do understand that it is really nice to be able to automagically hash any integer bit width, but it seems that the current code may be performing sub-par to what I originally reviewed.

I would also like to make a case for lower quality, higher throughput hash functions - If people want a high quality hash function, they can always reach into std.hash for cryptographic hash functions, or a specific construction with high quality output. I feel that the number one use-case for this easy-to-reach hash function will be game & architecture developers who "just want a hash" without all the hastle - which means that speed of the hash may be more important than "perfect hash quality".

This isn't to say that the current ideas behind the implementation are bad or inferior, I just think that the original code I tested may have been better (in terms of use-cases) than the current code.

To satisfy the "make an int function that hashes any bit width", I propose that something along the lines of the following code is introduced:

pub fn int(x: anytype) @TypeOf(input) {
    switch (@bitSizeOf(input)) {
        0 => return 0,
        1 => return ~x,
        2...8 => return x *% @as(@TypeOf(x), @truncate(0x77)),
        16 => return uint16(x), // non-pub decl in current file
        32 => return uint32(x), // eventually non-pub decl in current file
        64 => return uint64(x), // non-pub decl in current file
        else => @compileError("int() function not implemented for current type"),
    };
}
Current PR Hash Function Test Results
ramen:[retrodev]:~/repos/Zig/test$ zig build run -Doptimize=ReleaseFast | RNG_test stdin32
RNG_test using PractRand version 0.95
RNG = RNG_stdin32, seed = unknown
test set = core, folding = standard (32 bit)

rng=RNG_stdin32, seed=unknown
length= 256 megabytes (2^28 bytes), time= 3.3 seconds
  no anomalies in 168 test result(s)

rng=RNG_stdin32, seed=unknown
length= 512 megabytes (2^29 bytes), time= 8.3 seconds
  Test Name                         Raw       Processed     Evaluation
  FPF-14+6/16:all                   R=  -5.8  p =1-3.2e-5   mildly suspicious
  ...and 179 test result(s) without anomalies

rng=RNG_stdin32, seed=unknown
length= 1 gigabyte (2^30 bytes), time= 15.2 seconds
  Test Name                         Raw       Processed     Evaluation
  FPF-14+6/16:(0,14-0)              R=  -9.1  p =1-4.1e-8   suspicious       
  FPF-14+6/16:(1,14-0)              R=  -7.3  p =1-2.2e-6   unusual          
  FPF-14+6/16:all                   R= -13.8  p =1-3.8e-13    FAIL           
  ...and 191 test result(s) without anomalies
Current PR Hash Function Testing Code
const std = @import("std");

pub fn main() !void {
    const endian = @import("builtin").cpu.arch.endian();

    const stdout = std.io.getStdOut().writer();
    var bw = std.io.bufferedWriter(stdout);
    const writer = bw.writer();

    var input: u32 = 0;
    while (true) : (input +%= 1) {
        const hash = int(input);
        writer.writeInt(u32, hash, endian) catch unreachable;
    }

    try bw.flush();
}

/// Applies a bit-mangling transformation to an integer type `T`.
pub fn int(input: anytype) @TypeOf(input) {
    const int_type = @TypeOf(input);
    const bits = @typeInfo(int_type).int.bits;
    const Unsigned = @Type(.{ .int = .{ .signedness = .unsigned, .bits = bits } });
    var x: Unsigned = @bitCast(input);
    if (bits == 0) {
        x = 0;
    } else if (bits == 1) {
        x = ~x;
    } else if (bits == 2) {
        const d = x +% 1;
        x = ~((d >> 1) | (d << 1));
    } else {
        const constants = comptime blk: {
            const shift_1 = (bits >> 1);
            const shift_2 = shift_1 + 1;
            var rng = @import("std").Random.Pcg.init(0xc8dea3c3eab7eaa1);
            const sampler = rng.random();
            const v = .{
                .add = sampler.int(Unsigned),
                .mulshifts = .{
                    .{ sampler.int(Unsigned) | 1, shift_2 },
                    .{ sampler.int(Unsigned) | 1, shift_1 },
                    .{ sampler.int(Unsigned) | 1, shift_2 },
                    .{ sampler.int(Unsigned) | 1, shift_1 },
                },
            };
            break :blk v;
        };
        x +%= constants.add;
        inline for (constants.mulshifts) |ms| {
            x *%= ms[0];
            x ^= x >> ms[1];
        }
    }
    return @bitCast(x);
}

EDIT: Here is one path you could try - I find this code better aligns with what I am thinking:

Example code for PR, feel free to steal
/// Easy & fast hash function for integer types
pub fn int(input: anytype) @TypeOf(input) {
    // This function is only intended for integer types
    comptime assert(@typeInfo(@TypeOf(input)) == .int);
    const bits = @typeInfo(@TypeOf(input)).int.bits;
    // Convert input to unsigned integer (easier to deal with)
    const Uint = @Type(.{ .int = .{ .bits = bits, .signedness = .unsigned } });
    const u_input: Uint = @bitCast(input);
    // For bit widths that don't have a dedicated function, use a heuristic
    // construction with a multiplier suited to diffusion -
    // a mod 2^bits where a^2 - 46 * a + 1 = 0 mod 2^(bits + 4)
    const mult: Uint = @truncate(0x4ff55ba64bb740e1_35db2be3690a61d3);
    // The bit width of the input integer determines how to hash it
    const output = switch (bits) {
        0...15 => u_input *% mult,
        16 => uint16(u_input),
        32 => uint32(u_input),
        64 => uint64(u_input),
        else => blk: {
            // Might be better to just have a @compileError("unsupported");
            var x: Uint = u_input;
            inline for (0..4) |_| {
                x ^= x >> (bits / 2);
                x *%= mult;
            }
            break :blk x;
        },
    };
    return @bitCast(output);
}

// Source: https://github.com/skeeto/hash-prospector
fn uint16(input: u16) u16 {
    var x: u16 = input;
    x = (x ^ (x >> 7)) *% 0x2993;
    x = (x ^ (x >> 5)) *% 0xe877;
    x = (x ^ (x >> 9)) *% 0x0235;
    x = x ^ (x >> 10);
    return x;
}

// DEPRECATED: use std.hash.int()
// Source: https://github.com/skeeto/hash-prospector
pub fn uint32(input: u32) u32 {
    var x: u32 = input;
    x = (x ^ (x >> 17)) *% 0xed5ad4bb;
    x = (x ^ (x >> 11)) *% 0xac4c1b51;
    x = (x ^ (x >> 15)) *% 0x31848bab;
    x = x ^ (x >> 14);
    return x;
}

// Source: https://github.com/jonmaiga/mx3
fn uint64(input: u64) u64 {
    var x: u64 = input;
    const c = 0xbea225f9eb34556d;
    x = (x ^ (x >> 32)) *% c;
    x = (x ^ (x >> 29)) *% c;
    x = (x ^ (x >> 32)) *% c;
    x = x ^ (x >> 29);
    return x;
}

@andrewrk
Copy link
Member

andrewrk commented Nov 6, 2024

Reminder that I marked an earlier state of this branch as auto-merge. I think it was a mistake to force-push again rather than wait until that landed and propose a new change.

@francescoalemanno
Copy link
Contributor Author

francescoalemanno commented Nov 6, 2024

@RetroDev256 : past PR and current PR failt practrand at 1GiB, still 4 times bigger than stdlib current default. the metric used is to minimise 1 point correlation function and 2 points correlation functions over the variable "hash(r)^hash(r xor (1<<b))", r,b random in their respective ranges. these two metrics suffice to ensure uniformity of diffusion and bit independence (the qualities needed for a non-crypto hash).

@andrewrk : Sorry, I was under the impression that after merging it would become unchangeable for a very long time. And I felt that it could be improved.

In any case, I am in favor of reverting to a more controllable implementation, can I just put a few tweaks on top of RetroDev proposal (which is the natural evolution of the branch that was to be automerged... sorry guys) and move forward?

Modifications:

  • I like the idea of using the non rational solution of a quadratic, I made it work up to 256 bits, added Mathematica code in case anyone wants to verify the magic constant.
  • integers between sizes 3...15 were affected by fatal bias, it is best to make them pass through the generic solution.
Proposed version

/// Easy & fast hash function for integer types
pub fn int(input: anytype) @TypeOf(input) {
    // This function is only intended for integer types
    const input_type = @typeInfo(@TypeOf(input));
    if (input_type != .int) @compileError("std.hash.int only works on integer types.");
    const bits = input_type.int.bits;
    // Convert input to unsigned integer (easier to deal with)
    const Uint = @Type(.{ .int = .{ .bits = bits, .signedness = .unsigned } });
    const u_input: Uint = @bitCast(input);
    if (bits > 256) @compileError("bit widths > 256 are unsupported, use std.hash.autoHash functionality.");
    // For bit widths that don't have a dedicated function, use a heuristic
    // construction with a multiplier suited to diffusion -
    // a mod 2^bits where a^2 - 46 * a + 1 = 0 mod 2^(bits + 4),
    // on Mathematica: bits = 256; BaseForm[Solve[1 - 46 a + a^2 == 0, a, Modulus -> 2^(bits + 4)][[-1]][[1]][[2]], 16]
    const mult: Uint = @truncate(0xfac2e27ed2036860a062b5f264d80a512b00aa459b448bf1eca24d41c96f59e5b);
    // The bit width of the input integer determines how to hash it
    const output = switch (bits) {
        0...2 => u_input *% mult,
        16 => uint16(u_input),
        32 => uint32(u_input),
        64 => uint64(u_input),
        else => blk: {
            var x: Uint = u_input;
            inline for (0..4) |_| {
                x ^= x >> (bits / 2);
                x *%= mult;
            }
            break :blk x;
        },
    };
    return @bitCast(output);
}

// Source: https://github.com/skeeto/hash-prospector
fn uint16(input: u16) u16 {
    var x: u16 = input;
    x = (x ^ (x >> 7)) *% 0x2993;
    x = (x ^ (x >> 5)) *% 0xe877;
    x = (x ^ (x >> 9)) *% 0x0235;
    x = x ^ (x >> 10);
    return x;
}

// DEPRECATED: use std.hash.int()
// Source: https://github.com/skeeto/hash-prospector
pub fn uint32(input: u32) u32 {
    var x: u32 = input;
    x = (x ^ (x >> 17)) *% 0xed5ad4bb;
    x = (x ^ (x >> 11)) *% 0xac4c1b51;
    x = (x ^ (x >> 15)) *% 0x31848bab;
    x = x ^ (x >> 14);
    return x;
}

// Source: https://github.com/jonmaiga/mx3
fn uint64(input: u64) u64 {
    var x: u64 = input;
    const c = 0xbea225f9eb34556d;
    x = (x ^ (x >> 32)) *% c;
    x = (x ^ (x >> 29)) *% c;
    x = (x ^ (x >> 32)) *% c;
    x = x ^ (x >> 29);
    return x;
}

test int {
    const expectEqual = @import("std").testing.expectEqual;
    try expectEqual(0x1, int(@as(u1, 1)));
    try expectEqual(0x3, int(@as(u2, 1)));
    try expectEqual(0x4, int(@as(u3, 1)));
    try expectEqual(0xD6, int(@as(u8, 1)));
    try expectEqual(0x2880, int(@as(u16, 1)));
    try expectEqual(0x2880, int(@as(i16, 1)));
    try expectEqual(0x42741D6, int(@as(u32, 1)));
    try expectEqual(0x42741D6, int(@as(i32, 1)));
    try expectEqual(0x71894DE00D9981F, int(@as(u64, 1)));
    try expectEqual(0x71894DE00D9981F, int(@as(i64, 1)));
    try expectEqual(0x91206B847D4F47139E1F2030092AF020, int(@as(u128, 1)));
    try expectEqual(0x4755A1E0654CF625CED5ECA95965DAD55E332680070C074EA7659F74AA4243EE, int(@as(u256, 1)));
    try expectEqual(0x4755A1E0654CF625CED5ECA95965DAD55E332680070C074EA7659F74AA4243EE, int(@as(i256, 1)));
}

@RetroDev256
Copy link
Contributor

Thank you for this PR @francescoalemanno, I really like the effort you put into this. LGTM.

lib/std/hash.zig Outdated Show resolved Hide resolved
lib/std/hash.zig Outdated Show resolved Hide resolved
@francescoalemanno
Copy link
Contributor Author

Guys can this PR be merged?
@andrewrk @mlugg

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-ups. I've made one requested change that I will commit locally and force-push to your branch. Please leave it to me and I will get this landed.

lib/std/hash.zig Outdated
/// detour through many layers of abstraction elsewhere in the std.hash
/// namespace.
/// Copied from https://nullprogram.com/blog/2018/07/31/
/// Easy & fast hash function for integer types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid words like "easy" and "fast" in doc comments. These are subjective feelings that do not belong in technical writing.

Suggested change
/// Easy & fast hash function for integer types
/// Integer to integer hashing for bit widths <= 256.

francescoalemanno and others added 5 commits November 24, 2024 15:27
Before, the default bit mixer was very biased, and after a
lot of searching it turns out that selecting a better solution is hard.

I wrote a custom statistical analysis taylored for bit mixers in order
to select the best one at each size (u64/u32/u16), compared a lot of
mixers, and packaged the best ones in this commit.
also
* allow signed ints, simply bitcast them to unsigned
* handle odd bit sizes by upcasting and then truncating
* naming conventions
* remove redundant code
* better use of testing API
In the parent commit, I handled odd bit sizes by upcasting and
truncating. However it seems the else branch is intended to handle
those cases instead, so this commit reverts that behavior.
Uses the non rational solution of a quadratic, I made it work up to 256
bits, added Mathematica code in case anyone wants to verify the magic
constant.

integers between sizes 3...15 were affected by fatal bias, it is best to
make them pass through the generic solution.

Thanks to RetroDev256 & Andrew feedback.
@andrewrk andrewrk enabled auto-merge November 24, 2024 23:33
@andrewrk andrewrk merged commit f4e042a into ziglang:master Nov 25, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants