-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorporate upstream fixes to crc32c.c #52326
Conversation
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
20938f6
to
709ec3a
Compare
@vtjnash, I updated the other constraints following your example. Looks good? |
This causes a build failure on 32bit debug |
To reproduce the error:
|
Are the constraints actually impossible or is this a compiler bug? |
Maybe we should just change __asm__(CRC32_PTR "\t" "(%3), %0\n\t"
CRC32_PTR "\t" LONGx1 "(%3), %1\n\t"
CRC32_PTR "\t" LONGx2 "(%3), %2"
: "+r"(crc0), "+r"(crc1), "+r"(crc2)
: "r"(buf),
"m"(* (const char (*)[sizeof(void*)]) &buf[0]),
"m"(* (const char (*)[sizeof(void*)]) &buf[LONG]),
"m"(* (const char (*)[sizeof(void*)]) &buf[LONG*2])); to __asm__(CRC32_PTR "\t" "(%3), %0\n\t"
CRC32_PTR "\t" LONGx1 "(%3), %1\n\t"
CRC32_PTR "\t" LONGx2 "(%3), %2"
: "+r"(crc0), "+r"(crc1), "+r"(crc2)
: "r"(buf),
"m"(* (const char (*)[sizeof(void*)]) buf)); (and similarly for the |
Both gcc and clang do complain, but only for
Freeing up the frame pointer helps, e.g. |
how about this: __asm__(CRC32_PTR "\t" "(%1), %0\n"
: "+r"(crc0),
: "r"(buf),
"m"(* (const char (*)[sizeof(void*)]) &buf[0]));
__asm__(CRC32_PTR "\t" LONGx1 "(%1), %0\n"
: "+r"(crc1),
: "r"(buf),
"m"(* (const char (*)[sizeof(void*)]) &buf[LONG]));
__asm__(CRC32_PTR "\t" LONGx1 "(%1), %0\n"
: "+r"(crc2),
: "r"(buf),
"m"(* (const char (*)[sizeof(void*)]) &buf[LONG*2])); we can simplify them also like this:
|
See #52437 for updated constraints to fix the 32-bit debug build. |
You don't need the If you have different input constraints for the three instructions in one If you feel the need to tell the compiler the extent of the operand, then you should use So this works: __asm__("crc32q\t%3, %0\n\t"
"crc32q\t%4, %1\n\t"
"crc32q\t%5, %2"
: "+r"(crc0), "+r"(crc1), "+r"(crc2)
: "m"(*(uint64_t const *)next),
"m"(*(uint64_t const *)(next + LONG)),
"m"(*(uint64_t const *)(next + 2*LONG))); |
Ah, ok. |
As suggested in #52326#issuecomment-1840999660 For JuliaPackaging/Yggdrasil#7757 (comment)
What gives that more options, when "rm" seems to mean "select either a register or a memory expression", which is stricly more optimization permission, and the crc32 instruction permits either form? They seem to generate identical code for me (at optimization levels at or above -Og anyways) |
These examples are with clang 15, using The
Whereas without the
The subsequent shorter loop with three
...
The net effect is about a 30% speedup by removing the |
Closes #52325.
I have
noonly a vague idea what these changes to the__asm__
register arguments actually do; I just copied them from @madler upstream. It would be good if @yuyichao or @vtjnash or someone else who knows__asm__
syntax would take a look. It looks like they are a response to this comment on StackOverflow:There should be test coverage of this PR via https://github.com/JuliaLang/julia/blob/master/stdlib/CRC32c/test/runtests.jlThe existing test coverage wasn't enough to fully exercise Adler's code, so I added another test.