-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow ccall of LLVM intrinsics #3969
Conversation
I rather like this, but this is definitely @JeffBezanson's call. It's unfortunately not a net deletion of code – but it seems like a change that could lead to major code deletion in the future, which is always good. |
It does seem that there are cases where there are some benefits. We probably do not want intrinsics that call libm, because then it is quite likely that LLVM will end up dispatching to the system libm and not openlibm. Perhaps there might be a way to build LLVM with openlibm, but I am quite sure that they have not envisioned that possibility. For sqrt, openlibm does have an assembly implementation. On memset/memmove/memcpy, it may be worthwhile to use the intrinsics so that we benefit in the few cases when the compiler is able to optimize something. Would love to here what @JeffBezanson has to say. |
If you set the makefile variable LLVM will generally try to optimize any call to a function named "memset", etc. by automatically converting it into an intrinsic. Similarly, any call to a libm function is already automatically replaced by the llvm intrinsic (but still calls the openlibm function, unless it can constant fold the expression). |
If the module already contains a function with the name LLVM is looking for, the intrinsic will use it instead of loading libm, or at least that is what It seems like LLVM should be capable of optimizing memset etc. into intrinsics, but I can't actually get Julia to do it. |
Julia isn't optimizing the |
I'm somewhat torn over this one. On the one hand it's nice to do everything uniformly on the other hand, I don't really see the need for this and would much prefer a way to call generic llvm IR. I do however also defer the decision to @JeffBezanson |
I feel the same way as @loladiro . It's also not really a C call and thus a bit of an abuse of notation. |
How about a similar mechanism for calling llvm intrinsics that doesn't use the ccall name? Like llvmcall? |
Another issue is that calling llvm intrinsics just isn't that useful. There aren't many of them that we'd be able to use, and most just correspond to c library functions. |
There are a few useful x86-specific intrinsics corresponding to individual instructions, but they're undocumented and I'm not sure they're meant to be used outside of LLVM. In any case, the ability to inline generic LLVM IR would be strictly more powerful than this PR. |
This simplifies things while adding flexibility, and as far as I can tell it generates exactly the same code.
For the moment, I've only converted the
bswap_int
andctpop_int
intrinsics, but more could potentially be done with this:ctlz_int
(leading_zeros
) andcttz_int
(trailing_zeros
) can be converted from Julia intrinsics to ccalls to LLVM intrinsicsabs_float
can be converted from a Julia intrinsic to a ccall tollvm.fabs
floor
,ceil
, andtrunc
calls to libm withllvm.floor
,llvm.ceil
, andllvm.trunc
, which compile to a single instruction and are considerably fasterllvm.sqrt
compiles down to a single instruction, but the LLVM intrinsic has undefined behavior on negative values, so at the moment I think this would still need a Julia intrinsic to be inlineable. In a naïve benchmark it also doesn't seem any faster than calling into libm, although it might be faster in cases where the next operation is not a sqrtllvm.powi
was discussed in ^ is slow #2741. It looks like LLVM just lowers this into a bunch of multiply instructions if the exponent is a constant. It doesn't sound like this will always give the same result aspow
memcpy
/memmove
/memset
, but AFAICT they also just call the library functions unless they can be optimized out, and in the few cases where they're used I doubt they can be