-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
carryless multiplication builtin #9631
Comments
The good news and bad news is that you are the pioneer of the first such builtin. One trick you could try would be using clang to emit LLVM IR, using the pclmulqdq intrinsic, but specifying an x86 CPU that does not have the instruction. In this case it may emit a call to a compiler-rt function, which we can make sure is implemented for other architectures in addition to x86. Regardless, I do think that if you start on this feature, it can be tackled one bit at a time and I'd be happy to help at any point along the way. |
Interesting. Not sure if this is what you meant, but I tried the following. I guess my clang-foo is failing. Any suggestions? Not sure 'i386' is a correct option for -mcpu. I tried several others like 'westmere', 'haswell' w/ same results. $ clang-12 -c builtin-things.c -o foo.bc -emit-llvm --target=x86_64-linux -mcpu=i386 -mpclmul && llvm-dis-12 foo.bc -o foo.ll
clang: warning: argument unused during compilation: '-mcpu=i386' [-Wunused-command-line-argument]
$ cat builtin-things.c
#include <stdint.h>
#include <emmintrin.h>
#include <immintrin.h>
uint64_t prefix_xor(const uint64_t bitmask) {
__m128i all_ones = _mm_set1_epi8('\xFF');
__m128i result = _mm_clmulepi64_si128(_mm_set_epi64x(0ULL, bitmask), all_ones, 0);
return _mm_cvtsi128_si64(result);
}
|
So now we know how clang handles this situation: compile error For Zig, we should make the builtin always work, by providing an implementation if necessary. |
You can reuse parts of my open PR #9578 to select the correct function at comptime and expose the respective symbol. Fortunately Rust has implemented usage of this very intrinsic resolving how this gets lowered: rust-lang/stdarch#318. Feature detection is in LLVM in Probably it would also be good to have a central place for intrinsics. |
I started working on implementing this. I'm currently able to generate the llvm intrinsic but having a name mangling issue that i'm not sure how to fix. I've posted the issue to llvm irc / discord. The error is: ld.lld: error: undefined symbol: llvm.x86.pclmulqdq.v2i64 The correct name is just Of course the code is very hacky and messy so far. Just thought i would share my progress incase anyone has any thoughts on this mangling issue or any other thoughts about how to proceed. |
looks like the error from my previous comment was solved in llvm-13. |
riscv B extension has clmul too, simde has a cross platform c implementation we can learn from. And one of the use case of clmul is blazing fast clhash for zig cache |
addresses ziglang#9631. only works with llvm backend/x86 so far. allows new test/behavior/mul_carryless.zig to pass with -Denable-llvm. doesn't do any backend/arch/cpu-feature testing.
addresses ziglang#9631. only works with llvm backend/x86 so far. allows new test/behavior/mul_carryless.zig to pass with -Denable-llvm. doesn't do any backend/arch/cpu-feature testing.
addresses ziglang#9631. only works with llvm backend/x86 so far. allows new test/behavior/mul_carryless.zig to pass with -Denable-llvm. doesn't do any backend/arch/cpu-feature testing.
addresses ziglang#9631. only works with llvm backend/x86 so far. allows new test/behavior/mul_carryless.zig to pass with -Denable-llvm. doesn't do any backend/arch/cpu-feature testing.
some other important bit manipulation instructions: PDEP and PEXT (and CLMUL, which can also be used for constructing bitty steps other than crypto algorithms). it could be used in wide variety of data co/decompressing, de/encoding algorithms. polyfill: https://github.com/zwegner/zp7 (also see how AMD fails) there are many use cases mentioned on the web:
"elegance": https://news.ycombinator.com/item?id=20205743 though elegance can't be quantized, in my humble opinion, they're like the new "CLZ CTZ POPCNT triad" as standard bit manipulation units. that's useful, non-trivial to polyfill and makes pain. please consider also adding them to builtins.. |
I would like to work toward creating a carryless multiplication builtin in zig. This is a fast instruction used in simdjson for example to convert binary quote boundaries from json strings into masks. In the following example,
Q
is a 64 bit quote boundary marker. The last line is the result of a carryless multiplication betweenQ
and0xfffffffffffffff
In simdjson this is known as
prefix_xor
and is implemented here:Here are some references to this instruction in the zig repo:
zig/lib/std/crypto/ghash.zig
Line 93 in d298719
The llvm x86 intrinsic is
llvm.x86.pclmulqdq
Name ideas:
@mulCarryless()
@mulWithoutCarry()
I hope to use this in my simdjson port to get rid off hacky llvm intrinsic calls such as the following which may not be possible in stage 2:
Related to #903
If accepted, I'm not sure where I would begin. If anyone can suggest a similar builtin which uses different intrinsics per platform (and a custom implementation on arm) , perhaps i can follow its implementation.
The text was updated successfully, but these errors were encountered: