-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp, @fastmath, SVML vectorization. #21454
Comments
Have you actually seen this happen? I don't think we have ever lowered it in any way that llvm can recognize. |
Yes, we don't lower exp in a way that LLVM can recognize at the moment. However, it should be fairly simple to add a generic hook to fix that. As you noted, manually calling exp or the llvm intrinsic will work for testing purposes. The julia native implementation is generally faster than libm. In any case, there's no rush on this since we can't drop in SVML at the moment anyway. |
It seems intuitive to me that fastmath would allow calling a lower-accuracy version but not require it. In this case I believe the intent of the fastmath version was to skip error checks.
This doesn't really matter --- we could implement |
This is most likely an oversight that should be fixed. In fact, the fast math version seems slower.
My point being there should be no regression caused by this change in 0.6. Being able to vectorize it would obviously be even better.
Hopefully no since that means a failure to vectorize will create slower code.... |
There's no problem with just telling LLVM that our exp function is the same as what it considers |
That'll be cool. How hard would it be to tell LLVM that a julia function can be vectorized (either because there's no complex control flow in it or because we defined a version that operate on |
The hooks are already there in TargetLibraryInfo as I said, but may require some hacking to have it do anything other than what is hardcoded right now. For functions without complex control flow, LLVM should be able to figure out by itself that the function can be vectorized, so we should just fix that in LLVM. |
We are working on experimental patch to LLVM 4.0 which enables vectorization for all the SVML functions. Here is the list of enabled functions: |
We can't actually legally distribute Julia with MKL unless Julia is built without any GPL libraries, which is not a standard build setup, so we won't be able to ship with SVML either. If we get rid of all the GPL libraries from Base Julia (which is a long term goal) then we'll be able to ship with MKL and SVML. |
Of course if Intel wanted to open source SVML under a GPL-compatible license that'd be great (and we could start using it immediately). |
Ditto with MKL 😀 |
While we are considering open-sourceing SVML (though it might be still limitted and takes time to release).. it's quite unlikely to happen for MKL |
I think this is a duplicate of #15265. |
@StefanKarpinski @Keno, Viral assured us that "We expect that JuliaPro will start shipping with mkl by juliacon." |
As I said SVML is not currently integrable into Julia for technical reasons. Intel NDA prevents me from giving details in this forum. Feel free to email me. |
Is there an update to having SVML under Julia? It seems to be holding back Julia (At part of the reason) in the following test: https://www.modelsandrisk.org/appendix/speed/ Though Python + Numba is still faster when Julia is using |
No. |
In the long run, it would be neat to have something like ISPC (https://ispc.github.io/) in Julia itself. |
@simonbyrne , Using SVML and |
They are complimentary, but properly integrating SVML requires julia at the frontend level to be aware of vector lanes, which we currently don't have, but would be a prerequisite for exposing a general spmd programming model. |
@Keno, numba is not aware of vector lanes, still thanks to ability of LLVM to recognize libm calls and transform them into svml calls along allows it to enjoy nice speedups on transcendental functions. I know that Julia goes away from libm calls.. but can you consider having a mixed or opt-in approach to enable them back? E.g. we can start with some` @fastmath(SVML)-like macro which will enable good old libm functions emitting and switch LLVM into SVML mode? |
Sure, that's why I said "properly integrating". A plethora of other hacks are and have always been possible. E.g. we used SVML for Celeste just fine. |
In Julia 0.6, I noticed that exp is no longer a call to libm but has been implemented in Julia itself. I wonder if this decision has potential performance implications not far down the road. Through SVML, LLVM is able to provide vectorization for exp, if exp is invoked through a SVML intrinsic or a call into libm. It won't vectorize if the expanded LLVM from Julia's exp implementation is included. We can use fastmath to revert to a call to libm but this raises the question of the semantics of fastmath. It seems like the semantics of fastmath should be a loss of accuracy in exchange for performance. My understanding is that this is indeed the behavior for Julia fastmath adds in that Julia will use the LLVM fastmath flag. I also believe that currently Julia fastmath exp is not consistent in that it does not signal a lower accuracy version and so we would expect Julia's exp to have the same accuracy/performance as fastmath/libm exp?
I have been told that SVML provides three points within the accuracy/performance tradeoff space. We can debate but it seems like fastmath should map to one of the lower two accuracy (higher performance) levels. The question then becomes, how do you get vectorization at the highest accuracy level with SVML? It seems that implementing exp in Julia precludes this possibility unless more code is added to detect potential vectorization with SVML and in that case to revert back to a libm call. Why not just always libm then? In what circumstance is Julia exp superior?
The text was updated successfully, but these errors were encountered: