-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison to twox-hash? #10
Comments
I didn't do proper benchmarks, only some for my own reference. In addition to that it doesn't provide one-shot version of hashing algorithm which tends to be times faster than streaming one. In general So to answer concrete questions:
Generally speaking yes (albeit in streaming variants of xxh64 you only gain marginal improvement)
Hashing functions are pretty trivial to use in general.
It is just implementation of xxhash algorithm. |
Looks like it does: |
Ah my bad, for some reason I thought it doesn't. |
I'm interested in comparison benchmarks between the two crates. |
@firasuke The only algorithm that is worth benchmarking is xxh3, which is not available in its latest version for P.s. Just for reference, I strive to achieve as much parity with C implementation as possible (which is at the current moment very close to perfect) |
@DoumanAsh Thanks for the reply, and no worries! I already switched to |
I'm not super familiar with xxh3 (or xxhash in general). I want a good hashing function for data integrity - potentially to replace CRC32. Why xxh3? |
In first place CRC32 is not intended as general purpose hashing function so I can only suggest to use xxhash if you need non-cryptographic hashing function. xxh3 should be your choice due to improved performance comparing to older hashing functions in xxh family. |
Yeah; I'm just throwing a checksum at the end of a new binary file format (for CRDTs) for the sake of detecting silent data corruption. CRC32c is probably fine, but if xxhash would be a better choice, then nows the time to make that decision. xxh3 looks good on paper, but I'm also targetting wasm where code size matters. And xxh32 adds much less code size compared to xxh3. A 32 bit checksum is plenty for what I'm using it for. But if xxh3 is "the future" then I'm worried that xxh32 might stay niche and not be widely implemented going forward? Stability & portability is very important. O_o I have no idea what the best call is here. |
xxh32/xxh64 are good for code size I suppose, but if you want something akin to checksum you need to reduce potential for collisions Considering your purpose is to use hash as checksum, I believe xxh3 with 128bit variant is the best option. P.s. 32bit hash is not enough for your purpose, do not look at code size, WASM is not optimized for that anyway P.s.s if you worry about code size, always enable optimizations for code size:
|
Thanks! I've spent a lot of time tweaking compilation flags. My CRDT library is ~250kb, and I'm keeping a careful eye on it because I want it to get smaller, not bigger. In theory TCP / filesystems should keep data consistent anyway. The checksum is really mostly there out of paranoia. Years ago I read that checksums fail in practice (even in situations they shouldn't) much more often than you expect. So I'm a bit curious to see if / when my checksums start failing. CRDT patches get tiny sometimes since we send one patch per keystroke. A 128 bit hash would make sense for a 1mb file or something, but when I'm sending 20 bytes of patch content, adding 16 bytes of 128-bit xxh3 hash seems excessive. I'll stick with crc32c for now. Thanks for humoring me! I really appreciate it! |
Ah I understand, in this case yeah the best is to choose between xxh32 and crc32. |
Would be helpful to mention in readme how this is different from twox-hash. Is it faster? Is it easier to use? Or just an alternative implementation?
The text was updated successfully, but these errors were encountered: