exp, @fastmath, SVML vectorization. #21454

DrTodd13 · 2017-04-20T17:04:28Z

In Julia 0.6, I noticed that exp is no longer a call to libm but has been implemented in Julia itself. I wonder if this decision has potential performance implications not far down the road. Through SVML, LLVM is able to provide vectorization for exp, if exp is invoked through a SVML intrinsic or a call into libm. It won't vectorize if the expanded LLVM from Julia's exp implementation is included. We can use fastmath to revert to a call to libm but this raises the question of the semantics of fastmath. It seems like the semantics of fastmath should be a loss of accuracy in exchange for performance. My understanding is that this is indeed the behavior for Julia fastmath adds in that Julia will use the LLVM fastmath flag. I also believe that currently Julia fastmath exp is not consistent in that it does not signal a lower accuracy version and so we would expect Julia's exp to have the same accuracy/performance as fastmath/libm exp?

I have been told that SVML provides three points within the accuracy/performance tradeoff space. We can debate but it seems like fastmath should map to one of the lower two accuracy (higher performance) levels. The question then becomes, how do you get vectorization at the highest accuracy level with SVML? It seems that implementing exp in Julia precludes this possibility unless more code is added to detect potential vectorization with SVML and in that case to revert back to a libm call. Why not just always libm then? In what circumstance is Julia exp superior?

yuyichao · 2017-04-20T17:08:34Z

Through SVML, LLVM is able to provide vectorization for exp, if exp is invoked through a SVML intrinsic or a call into libm.

Have you actually seen this happen? I don't think we have ever lowered it in any way that llvm can recognize.

Keno · 2017-04-20T17:14:26Z

Yes, we don't lower exp in a way that LLVM can recognize at the moment. However, it should be fairly simple to add a generic hook to fix that. As you noted, manually calling exp or the llvm intrinsic will work for testing purposes. The julia native implementation is generally faster than libm. In any case, there's no rush on this since we can't drop in SVML at the moment anyway.

JeffBezanson · 2017-04-20T17:14:31Z

I also believe that currently Julia fastmath exp is not consistent in that it does not signal a lower accuracy version

It seems intuitive to me that fastmath would allow calling a lower-accuracy version but not require it. In this case I believe the intent of the fastmath version was to skip error checks.

I don't think we have ever lowered it in any way that llvm can recognize.

This doesn't really matter --- we could implement exp in such a way that llvm could recognize it, but that would mean skipping the julia implementation, so the point still stands.

yuyichao · 2017-04-20T17:17:42Z

We can use fastmath to revert to a call to libm

This is most likely an oversight that should be fixed. In fact, the fast math version seems slower.

This doesn't really matter

My point being there should be no regression caused by this change in 0.6. Being able to vectorize it would obviously be even better.

that would mean skipping the julia implementation

Hopefully no since that means a failure to vectorize will create slower code....

Keno · 2017-04-20T17:19:00Z

There's no problem with just telling LLVM that our exp function is the same as what it considers exp to be. Just one extra hook in TargetLibraryInfo. Even better we could come up with a generic way of annotating This function is a vectorized version of this other function.

yuyichao · 2017-04-20T17:23:59Z

That'll be cool. How hard would it be to tell LLVM that a julia function can be vectorized (either because there's no complex control flow in it or because we defined a version that operate on NTuple{...,VecElement{...}} directly)?

Keno · 2017-04-20T17:28:34Z

The hooks are already there in TargetLibraryInfo as I said, but may require some hacking to have it do anything other than what is hardcoded right now. For functions without complex control flow, LLVM should be able to figure out by itself that the function can be vectorized, so we should just fix that in LLVM.

anton-malakhov · 2017-04-20T19:00:51Z

We are working on experimental patch to LLVM 4.0 which enables vectorization for all the SVML functions. Here is the list of enabled functions: sin cos pow exp log acos acosh asin asinh atan2 atan atanh cbrt cdfnorm cdfnorminv ceil cosd cosh erf erfc erfcinv erfinv exp10 exp2 expm1 floor fmod hypot invsqrt log10 log1p log2 logb nearbyint rint round sind sinh sqrt tan tanh trunc
Moreover, Intel will soon provide you a license to redistribute SVML the same way as your redistribute MKL in your binary Julia Distribution. It would be very cool if we can enable vectorization of these functions not only in the fastmath mode, SVML provides HA functions for high accuracy as well.

StefanKarpinski · 2017-04-20T19:27:19Z

We can't actually legally distribute Julia with MKL unless Julia is built without any GPL libraries, which is not a standard build setup, so we won't be able to ship with SVML either. If we get rid of all the GPL libraries from Base Julia (which is a long term goal) then we'll be able to ship with MKL and SVML.

Keno · 2017-04-20T19:34:49Z

Of course if Intel wanted to open source SVML under a GPL-compatible license that'd be great (and we could start using it immediately).

StefanKarpinski · 2017-04-20T19:52:07Z

Ditto with MKL 😀

anton-malakhov · 2017-04-20T19:55:04Z

While we are considering open-sourceing SVML (though it might be still limitted and takes time to release).. it's quite unlikely to happen for MKL

simonbyrne · 2017-04-21T07:57:35Z

I think this is a duplicate of #15265.

anton-malakhov · 2017-04-21T14:46:11Z

@StefanKarpinski @Keno, Viral assured us that "We expect that JuliaPro will start shipping with mkl by juliacon."
Thus our question is w.r.t. integration of SVML into MKL build of Julia Pro distro is still valid and urgent enough.

Keno · 2017-04-21T14:49:02Z

As I said SVML is not currently integrable into Julia for technical reasons. Intel NDA prevents me from giving details in this forum. Feel free to email me.

RoyiAvital · 2018-07-13T12:26:57Z

Is there an update to having SVML under Julia?

It seems to be holding back Julia (At part of the reason) in the following test:

https://www.modelsandrisk.org/appendix/speed/

Though Python + Numba is still faster when Julia is using @inbounds and Apple libm (See https://julialang.slack.com/archives/C67910KEH/p1531490464000597?thread_ts=1531475750.000264&cid=C67910KEH).

StefanKarpinski · 2018-07-13T12:30:02Z

No.

simonbyrne · 2018-07-19T16:36:50Z

In the long run, it would be neat to have something like ISPC (https://ispc.github.io/) in Julia itself.

RoyiAvital · 2018-07-19T18:51:49Z

@simonbyrne , Using SVML and ispc like approach are complementary of each other, aren't they?
Not that I'm an expert on that but I would assume integrating SVML is easier especially when Intel offers assistance.

Keno · 2018-07-19T18:53:30Z

They are complimentary, but properly integrating SVML requires julia at the frontend level to be aware of vector lanes, which we currently don't have, but would be a prerequisite for exposing a general spmd programming model.

anton-malakhov · 2018-07-20T04:16:03Z

@Keno, numba is not aware of vector lanes, still thanks to ability of LLVM to recognize libm calls and transform them into svml calls along allows it to enjoy nice speedups on transcendental functions. I know that Julia goes away from libm calls.. but can you consider having a mixed or opt-in approach to enable them back? E.g. we can start with some` @fastmath(SVML)-like macro which will enable good old libm functions emitting and switch LLVM into SVML mode?

Keno · 2018-07-20T04:48:22Z

Sure, that's why I said "properly integrating". A plethora of other hacks are and have always been possible. E.g. we used SVML for Celeste just fine.

JeffBezanson added compiler:codegen Generation of LLVM IR and native code maths Mathematical functions performance Must go faster labels Apr 20, 2017

Keno closed this as completed Jul 19, 2018

Keno reopened this Jul 19, 2018

simonbyrne mentioned this issue Apr 11, 2019

A faster log function #8869

Closed

aminya mentioned this issue Nov 27, 2019

Incorporating SVML JuliaMath/IntelVectorMath.jl#22

Open

Keno mentioned this issue Nov 7, 2024

Proposed semantics for implicit vectorization of primitives #56481

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp, @fastmath, SVML vectorization. #21454

exp, @fastmath, SVML vectorization. #21454

DrTodd13 commented Apr 20, 2017

yuyichao commented Apr 20, 2017 •

edited

Loading

Keno commented Apr 20, 2017 •

edited

Loading

JeffBezanson commented Apr 20, 2017

yuyichao commented Apr 20, 2017

Keno commented Apr 20, 2017 •

edited

Loading

yuyichao commented Apr 20, 2017

Keno commented Apr 20, 2017

anton-malakhov commented Apr 20, 2017 •

edited

Loading

StefanKarpinski commented Apr 20, 2017

Keno commented Apr 20, 2017

StefanKarpinski commented Apr 20, 2017

anton-malakhov commented Apr 20, 2017

simonbyrne commented Apr 21, 2017

anton-malakhov commented Apr 21, 2017 •

edited by StefanKarpinski

Loading

Keno commented Apr 21, 2017

RoyiAvital commented Jul 13, 2018 •

edited

Loading

StefanKarpinski commented Jul 13, 2018

simonbyrne commented Jul 19, 2018

RoyiAvital commented Jul 19, 2018

Keno commented Jul 19, 2018 •

edited

Loading

anton-malakhov commented Jul 20, 2018

Keno commented Jul 20, 2018

exp, @fastmath, SVML vectorization. #21454

exp, @fastmath, SVML vectorization. #21454

Comments

DrTodd13 commented Apr 20, 2017

yuyichao commented Apr 20, 2017 • edited Loading

Keno commented Apr 20, 2017 • edited Loading

JeffBezanson commented Apr 20, 2017

yuyichao commented Apr 20, 2017

Keno commented Apr 20, 2017 • edited Loading

yuyichao commented Apr 20, 2017

Keno commented Apr 20, 2017

anton-malakhov commented Apr 20, 2017 • edited Loading

StefanKarpinski commented Apr 20, 2017

Keno commented Apr 20, 2017

StefanKarpinski commented Apr 20, 2017

anton-malakhov commented Apr 20, 2017

simonbyrne commented Apr 21, 2017

anton-malakhov commented Apr 21, 2017 • edited by StefanKarpinski Loading

Keno commented Apr 21, 2017

RoyiAvital commented Jul 13, 2018 • edited Loading

StefanKarpinski commented Jul 13, 2018

simonbyrne commented Jul 19, 2018

RoyiAvital commented Jul 19, 2018

Keno commented Jul 19, 2018 • edited Loading

anton-malakhov commented Jul 20, 2018

Keno commented Jul 20, 2018

yuyichao commented Apr 20, 2017 •

edited

Loading

Keno commented Apr 20, 2017 •

edited

Loading

Keno commented Apr 20, 2017 •

edited

Loading

anton-malakhov commented Apr 20, 2017 •

edited

Loading

anton-malakhov commented Apr 21, 2017 •

edited by StefanKarpinski

Loading

RoyiAvital commented Jul 13, 2018 •

edited

Loading

Keno commented Jul 19, 2018 •

edited

Loading