Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heisenbug in crc code when 512-bit vectors are used with Intel C++ compiler #403

Closed
pps83 opened this issue Nov 26, 2024 · 5 comments · Fixed by #404
Closed

Heisenbug in crc code when 512-bit vectors are used with Intel C++ compiler #403

pps83 opened this issue Nov 26, 2024 · 5 comments · Fixed by #404
Labels

Comments

@pps83
Copy link
Contributor

pps83 commented Nov 26, 2024

I installed recently Intel's c++ compiler (I have 2024, and 2025 version). I didn't check my unit tests with intel compiler, but recently I noticed that one test fails sometimes. This my unit test simply calculates crc32 using zlib-ng and using libdeflate and reports timings. The crazy part is that the bug sometimes disappears: the same binary executed multiple times and sometimes libdeflate fails and sometimes returns correct crc32 value. When it fails, it always returns the same value (eg it's not always different when it fails).

Curious observation is that the build sometimes fails only when running non-optimized version. Just while writing this message I double checked different combinations and it appears that clang-cl also has the same issue!
These are the only differences that I have from public libdeflate: master...pps83:libdeflate:master

@pps83
Copy link
Contributor Author

pps83 commented Nov 27, 2024

Here's minimal repro test: https://github.com/pps83/libdeflate/tree/crctest (you need to build with clang-cl for Debug x64).
That's the output I get sometimes:

crc32:3403043317
ERROR: libdeflate crc32:2268885453

I also added crctest.bat that runs crctest.exe in a loop until it fails (you need to copy crctest.exe manually to the same dir as crctest.bat). Here's sample output, as you can see, sometimes it takes3-4 runs before it fails, sometimes fails right away:

image

ebiggers added a commit that referenced this issue Nov 27, 2024
The specifications for _mm256_castsi128_si256() and
_mm512_castsi128_si512() are bugged, as they leave the high bits
undefined instead of zeroed as would be expected.  Separate intrinsics
_mm256_zextsi128_si256() and _mm512_zextsi128_si512() were later added
to allow working around this defect.  Use them.

This fixes incorrect CRC checksums produced by
crc32_x86_vpclmulqdq_avx512_vl512() when built with 'clang -O0'.
Other cases are not known to have been affected.

Resolves #403
Fixes: 5f2a0b4 ("lib/x86/crc32: add VPCLMULQDQ implementations of CRC-32")
@ebiggers
Copy link
Owner

Thanks. #404 should fix this.

@ebiggers ebiggers added the bug label Nov 27, 2024
@pps83
Copy link
Contributor Author

pps83 commented Nov 27, 2024

Thanks. #404 should fix this.

that was quick. I'll test and report.
I'm wondering why is that msvc build doesn't have that issue

@pps83
Copy link
Contributor Author

pps83 commented Nov 27, 2024

fix works. FYI, added a comment here: #341 (comment)

@ebiggers
Copy link
Owner

I'm wondering why is that msvc build doesn't have that issue

Well, it is quite difficult for compilers to generate code that does not zeroize the upper bits when extending 128 bits to 512 bits, considering how the underlying instruction set works. clang somehow managed to do so, but only at -O0 where the generated code is very unoptimized and messy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants