Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xxHash as a checksum option for checksum and dedup algorithms #16503

Open
iBug opened this issue Sep 4, 2024 · 6 comments
Open

Add xxHash as a checksum option for checksum and dedup algorithms #16503

iBug opened this issue Sep 4, 2024 · 6 comments
Labels
Type: Feature Feature request or new feature

Comments

@iBug
Copy link

iBug commented Sep 4, 2024

Describe the feature would like to see added to OpenZFS

Add xxHash as an option for checksum and both xxhash and xxhash,verify for dedup.

How will this feature improve OpenZFS?

xxHash is sufficiently fast but much less collision-prone than fletcher4. This will improve ZFS resilience against silent data corruption as a competitive alternative to fletcher4.

Additional context

Performance as advertised by xxHash on its wiki: https://github.com/Cyan4973/xxHash/wiki/Performance-comparison (Note: fletcher4 not included in this page)

Collision ratio on xxHash wiki: https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comparison

  • xxHash produces only as much collisions as "mathematically expected", while fletcher4 produces ~40% on just 1 Gi (= 2**30) inputs.
@iBug iBug added the Type: Feature Feature request or new feature label Sep 4, 2024
@amotin
Copy link
Member

amotin commented Sep 4, 2024

Before discussing default change, the first step would to make it optional to measure its characteristics comparing to the others. And any algorithm added into the tree would have to stay there forever, so it must be really that good as advertised.

@iBug
Copy link
Author

iBug commented Sep 4, 2024

Sorry I misread the man page. I thought xxhash was already an option. Let me change this FR to adding it in the first place.

@iBug iBug changed the title Change checksum=on default algorithm to xxHash (currently fletcher4) Add xxHash as a checksum option for checksum and dedup algorithms Sep 4, 2024
@mcmilk
Copy link
Contributor

mcmilk commented Sep 16, 2024

I think fletcher4 is a bit faster then current OpenZFS xxhash variants - so adding it as a new hash doesn't make sense.
What version of xxHash is your intention?

Here you have some fine table with hashes and their speeds: https://rurban.github.io/smhasher/doc/table.html

Hash:		Speed in MiB/s
Fletcher 4:	15556.93
xxHash64	12108.87 (included in OpenZFS - zstd)
xxHash32:	5865.17 (included in OpenZFS - zstd)

@iBug
Copy link
Author

iBug commented Sep 17, 2024

@mcmilk Your table indicates xxh64 would be a good option. I'd like to reiterate that:

xxHash is sufficiently fast but much less collision-prone than fletcher4

With modern CPU so powerful, it makes sense to me to trade a bit of performance for much better sanity by replacing fletcher4 with xxh64.

@mcmilk
Copy link
Contributor

mcmilk commented Sep 17, 2024

Why not sth. like rapidhash, which has double the speed (23789 MiB/s) in that table¹ and no common problems ?

Also, with sse and avx the speed of fletcher-4 is a lot faster on my local notebook:

$ cat /proc/spl/kstat/zfs/fletcher_4_bench:
implementation   native         byteswap
scalar           9112861804     8831049465
superscalar      11681942207    11744320536
superscalar4     13586418453    11444139964
sse2             21310896019    10706136906
ssse3            21171146266    19126012775
avx2             38987296119    35445754442

¹https://rurban.github.io/smhasher/doc/table.html

Edit: I find xxh3 a nice fit:
61976089-aedeab00-af9f-11e9-9239-e5375d6c080f

@gmelikov
Copy link
Member

jfyi fletcher4 on amd ryzen 7840u with avx512:

0 0 0x01 -1 0 6423074939 87451661526534
implementation   native         byteswap
scalar           10659121732    8706704548
superscalar      14087467630    11536293324
superscalar4     15814463114    12305581118
sse2             22675386320    10542805362
ssse3            22375429000    20235123389
avx2             39958006169    37408283214
avx512f          42448290424    17524854325
avx512bw         42461612087    37391201332
fastest          avx512bw       avx2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

4 participants