Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ccall of LLVM intrinsics #3969

Closed
wants to merge 2 commits into from

Conversation

simonster
Copy link
Member

This simplifies things while adding flexibility, and as far as I can tell it generates exactly the same code.

For the moment, I've only converted the bswap_int and ctpop_int intrinsics, but more could potentially be done with this:

  • If we can drop support for LLVM <3.1, ctlz_int (leading_zeros) and cttz_int (trailing_zeros) can be converted from Julia intrinsics to ccalls to LLVM intrinsics
  • If we can drop support for LLVM <3.2, abs_float can be converted from a Julia intrinsic to a ccall to llvm.fabs
  • If we can drop support for LLVM <3.3, we can replace floor, ceil, and trunc calls to libm with llvm.floor, llvm.ceil, and llvm.trunc, which compile to a single instruction and are considerably faster
  • llvm.sqrt compiles down to a single instruction, but the LLVM intrinsic has undefined behavior on negative values, so at the moment I think this would still need a Julia intrinsic to be inlineable. In a naïve benchmark it also doesn't seem any faster than calling into libm, although it might be faster in cases where the next operation is not a sqrt
  • llvm.powi was discussed in ^ is slow #2741. It looks like LLVM just lowers this into a bunch of multiply instructions if the exponent is a constant. It doesn't sound like this will always give the same result as pow
  • LLVM has intrinsics for several other mathematical functions, but AFAICT they just call libm and the only advantage is that constant expressions can be evaluated at compile time
  • LLVM has intrinsics for memcpy/memmove/memset, but AFAICT they also just call the library functions unless they can be optimized out, and in the few cases where they're used I doubt they can be

@StefanKarpinski
Copy link
Member

I rather like this, but this is definitely @JeffBezanson's call. It's unfortunately not a net deletion of code – but it seems like a change that could lead to major code deletion in the future, which is always good.

@ViralBShah
Copy link
Member

It does seem that there are cases where there are some benefits. We probably do not want intrinsics that call libm, because then it is quite likely that LLVM will end up dispatching to the system libm and not openlibm. Perhaps there might be a way to build LLVM with openlibm, but I am quite sure that they have not envisioned that possibility.

For sqrt, openlibm does have an assembly implementation.

On memset/memmove/memcpy, it may be worthwhile to use the intrinsics so that we benefit in the few cases when the compiler is able to optimize something.

Would love to here what @JeffBezanson has to say.

@vtjnash
Copy link
Member

vtjnash commented Aug 7, 2013

If you set the makefile variable UNTRUSTED_SYSTEM_LIBM=1, the Julia build does a pretty good job at preventing LLVM from using the system libm, although this has only been tested on Windows.

LLVM will generally try to optimize any call to a function named "memset", etc. by automatically converting it into an intrinsic. Similarly, any call to a libm function is already automatically replaced by the llvm intrinsic (but still calls the openlibm function, unless it can constant fold the expression).

@simonster
Copy link
Member Author

If the module already contains a function with the name LLVM is looking for, the intrinsic will use it instead of loading libm, or at least that is what CodeGen/IntrinsicLowering.cpp and my testing suggest, but loading the appropriate functions into the module when the function is compiled would still require something beyond just a ccall to the LLVM intrinsic.

It seems like LLVM should be capable of optimizing memset etc. into intrinsics, but I can't actually get Julia to do it. f(x) = ccall(:memset, Ptr{Void}, (Ptr{Void}, Int32, Int), x, 1, 1) is still a function call for me, even if I enable the simplify-libcalls and memcpyopt passes.

@simonster
Copy link
Member Author

Julia isn't optimizing the memset because the data layout isn't set. I'll open a separate PR for this.

@Keno
Copy link
Member

Keno commented Aug 7, 2013

I'm somewhat torn over this one. On the one hand it's nice to do everything uniformly on the other hand, I don't really see the need for this and would much prefer a way to call generic llvm IR. I do however also defer the decision to @JeffBezanson

@JeffBezanson
Copy link
Member

I feel the same way as @loladiro . It's also not really a C call and thus a bit of an abuse of notation.

@StefanKarpinski
Copy link
Member

How about a similar mechanism for calling llvm intrinsics that doesn't use the ccall name? Like llvmcall?

@JeffBezanson
Copy link
Member

Another issue is that calling llvm intrinsics just isn't that useful. There aren't many of them that we'd be able to use, and most just correspond to c library functions.

@simonster
Copy link
Member Author

There are a few useful x86-specific intrinsics corresponding to individual instructions, but they're undocumented and I'm not sure they're meant to be used outside of LLVM. In any case, the ability to inline generic LLVM IR would be strictly more powerful than this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants