-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heisenbug in crc code when 512-bit vectors are used with Intel C++ compiler #403
Comments
Here's minimal repro test: https://github.com/pps83/libdeflate/tree/crctest (you need to build with clang-cl for Debug x64).
I also added crctest.bat that runs crctest.exe in a loop until it fails (you need to copy crctest.exe manually to the same dir as crctest.bat). Here's sample output, as you can see, sometimes it takes3-4 runs before it fails, sometimes fails right away: |
The specifications for _mm256_castsi128_si256() and _mm512_castsi128_si512() are bugged, as they leave the high bits undefined instead of zeroed as would be expected. Separate intrinsics _mm256_zextsi128_si256() and _mm512_zextsi128_si512() were later added to allow working around this defect. Use them. This fixes incorrect CRC checksums produced by crc32_x86_vpclmulqdq_avx512_vl512() when built with 'clang -O0'. Other cases are not known to have been affected. Resolves #403 Fixes: 5f2a0b4 ("lib/x86/crc32: add VPCLMULQDQ implementations of CRC-32")
Thanks. #404 should fix this. |
that was quick. I'll test and report. |
fix works. FYI, added a comment here: #341 (comment) |
Well, it is quite difficult for compilers to generate code that does not zeroize the upper bits when extending 128 bits to 512 bits, considering how the underlying instruction set works. clang somehow managed to do so, but only at |
I installed recently Intel's c++ compiler (I have 2024, and 2025 version). I didn't check my unit tests with intel compiler, but recently I noticed that one test fails sometimes. This my unit test simply calculates crc32 using zlib-ng and using libdeflate and reports timings. The crazy part is that the bug sometimes disappears: the same binary executed multiple times and sometimes libdeflate fails and sometimes returns correct crc32 value. When it fails, it always returns the same value (eg it's not always different when it fails).
Curious observation is that the build sometimes fails only when running non-optimized version. Just while writing this message I double checked different combinations and it appears that clang-cl also has the same issue!
These are the only differences that I have from public libdeflate: master...pps83:libdeflate:master
The text was updated successfully, but these errors were encountered: