Allow `@simd` to use vectorised math operations when available #15265

simonbyrne · 2016-02-27T09:27:14Z

It would be nice if we could write something like

function foo(X)
    u = zero(eltype(X))
    @simd for x in X
        u += log(x)
    end
    u
end

and have it automatically use a vector math function such as _mm_log_ps from MKL VML, or its Accelerate or Yeppp equivalents (we do have packages for those libraries, but they only expose the direct array operations, which require memory allocation).

It seems that LLVM has some optimisations for Accelerate, such that the following works on master on a Mac:

function llog(x::Float32)
    Base.llvmcall(("declare float @llvm.log.f32(float  %Val)",
                   """%2 = call float @llvm.log.f32(float %0)
                   ret float %2"""),
                  Float32,Tuple{Float32},x)
end

function foo(X)
    u = zero(eltype(X))
    @simd for x in X
        u += llog(x)
    end
    u
end

X = map(Float32,randexp(100))
@code_llvm foo(X)

but ideally this shouldn't require custom LLVM, and should work across multiple libraries. Perhaps we may be able to take advantage of developments in #15244 to get this to work more generally.

cc: @SimonDanisch, @eschnett, @ArchRobison

The text was updated successfully, but these errors were encountered:

nalimilan · 2016-02-27T12:03:13Z

No, what would really be cool is getting the same feature with this. :-)

foo(X) = @simd sum(log(x) for x in X)

(Anyway I guess having the longer form work is a step towards this.)

eschnett · 2016-02-27T14:39:12Z

I have a C++ library Vecmathlib https://bitbucket.org/eschnett/vecmathlib/wiki/Home that goes very much in this direction. This library is implemented in C++, using various vector intrinsics; a long-standing plan of mine is to rewrite this in Julia, and targeting LLVM directly.

This would essentially provide a new implementation of log et al. that (a) can be tuned (so that it's faster when @fastmath is in effect), and that (b) can be efficiently vectorized. That may mean it's a bit slower in the scalar case, but the 8x speedup due to vectorization should still be a clear advantage.

The SIMD package https://github.com/eschnett/SIMD.jl is a first step in this direction.

vtjnash · 2016-03-08T19:43:53Z

i'm going to close this as a non-actionable feature-wish. with #15244, i think the content here should be implementable in a package.

ArchRobison · 2016-03-08T20:40:31Z

I think #15244 is unrelated, since it is addressing tuple vectorization. What we want for this request is for Julia transcendentals to be lowered to LLVM intrinsics, and vectorization of those to target whatever vector library is available. That seems like pure LLVM work. The Apple Accelerate patch seems like a good reference.

eschnett · 2016-03-08T22:16:31Z

This feature request is actionable. The transformations required for this are equivalent to those performed by @fastmath. We'd need to introduce additional functions vlog etc., and make @simd replace log by vlog.

StefanKarpinski · 2016-03-08T22:24:56Z

It must be spring time – the weather is nice and Jameson is on his annual issue closing spree!

ArchRobison · 2016-03-08T22:26:27Z

@simd does not do vectorization. It just marks loops as not having cross-iteration dependences, and leaves the vectorization to LLVM. It's the LLVM vectorizer that has to replace scalar transcendentals with vector transcendentals, and hence has to be told how the mapping is done (like the Apple Accelerate patch does).

ArchRobison · 2016-03-08T22:27:56Z

The closing worked to make me pay attention :-)

eschnett · 2016-03-08T22:29:27Z

I was imagining a vlog function that still takes a scalar input, but which contains no opaque code (e.g. no calls to libm), so that LLVM can inline and vectorize it.

ArchRobison · 2016-03-08T22:39:59Z

I'm worried about code bloat (and compilation time) from inlining stuff as complicated as log. The LLVM vectorizer evidently has support for vectorizing transcendentals. We just have to figure out how to cater to it. (I have horror stories from a previous life on putting transforms in the wrong place because of non-technical reasons.)

simonbyrne · 2016-03-08T22:43:43Z

As a rough idea, I was thinking that when the vectorizer sees a call to foo(x::T), it would look for a method foo(x::SIMD{T}), which would call the vectorized library function.

ArchRobison · 2016-03-08T23:24:32Z

That's what the table VecFuncs in https://github.com/llvm-mirror/llvm/blob/246537be50cedeb0eb52c26439d201faf9be933a/lib/Analysis/TargetLibraryInfo.cpp#L511-L551 is essentially specifying. I suppose that if we wanted to call our own versions, maybe we could add a hook that lets Julia specify the table.

I think a similar discussion came up years ago, and there was some stumbling block (on Windows) where if we mapped transcendentals to LLVM intrinsics, we were stuck with them being mapped to the system libm in the backend, but I don't remember the details.

simonbyrne · 2016-03-09T11:07:31Z

I suppose that if we wanted to call our own versions, maybe we could add a hook that lets Julia specify the table.

That would be awesome, though I don't know how easy that would be

where if we mapped transcendentals to LLVM intrinsics, we were stuck with them being mapped to the system libm in the backend

I think that this is still a problem.

ViralBShah · 2020-12-17T00:13:24Z

Is it reasonable to close this issue given all the progress in the SIMD ecosystem?

cc @chriselrod

vchuravy · 2020-12-17T00:17:30Z

IIUC this is still an open issue that is only solved in special-cases.

oscardssmith · 2020-12-17T00:30:55Z

LoopVectorization does a pretty good job solving this, but it would be nice if the compiler got good enough to make it often happen automatically.

chriselrod · 2020-12-17T00:31:57Z

I don't think it's solved.
A couple possibilities that are within reach:

@simd won't vectorize sqrt at the moment, while @fastmath will (as does use Base.FastMath.sqrt_fast or Base.sqrt_llvm directly).
@simd ivdep would vectorize the current exp(::Float64) implementation if it inlined, but it is not inlined.

I think applying the vector function abi to call sites when using @simd + shipping a SIMD library supporting that abi would probably be the best way to solve this issue.
But ideally, we could implement the vector variants in pure Julia just like our scalar variants are (especially when, like in @oscardssmith's exp, the implementation is good for both vector and scalar, but not inlining prevents the autovectorizer from handling it).
I don't know enough about the llvm side of things to speculate what that would or should look like, but it seems like a possibility.

@avx of course works through multiple dispatch to call different implementations scalar vs vector implementations, which is the Julian approach.

vtjnash closed this as completed Mar 8, 2016

StefanKarpinski reopened this Mar 8, 2016

vtjnash added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels Mar 8, 2016

simonbyrne mentioned this issue Mar 9, 2016

Enable non-array SIMD optimisations JuliaLinearAlgebra/AppleAccelerate.jl#10

Open

simonbyrne mentioned this issue Dec 18, 2016

Stop hiding math intrinsics from llvm #19640

Closed

simonbyrne mentioned this issue Apr 21, 2017

exp, @fastmath, SVML vectorization. #21454

Open

mbauman added the compiler:simd instruction-level vectorization label Apr 24, 2018

simonbyrne mentioned this issue Apr 11, 2019

A faster log function #8869

Closed

aminya mentioned this issue Nov 27, 2019

Incorporating SVML JuliaMath/IntelVectorMath.jl#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `@simd` to use vectorised math operations when available #15265

Allow `@simd` to use vectorised math operations when available #15265

simonbyrne commented Feb 27, 2016

nalimilan commented Feb 27, 2016

eschnett commented Feb 27, 2016

vtjnash commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

eschnett commented Mar 8, 2016

StefanKarpinski commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

eschnett commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

simonbyrne commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

simonbyrne commented Mar 9, 2016

ViralBShah commented Dec 17, 2020

vchuravy commented Dec 17, 2020

oscardssmith commented Dec 17, 2020

chriselrod commented Dec 17, 2020

Allow @simd to use vectorised math operations when available #15265

Allow @simd to use vectorised math operations when available #15265

Comments

simonbyrne commented Feb 27, 2016

nalimilan commented Feb 27, 2016

eschnett commented Feb 27, 2016

vtjnash commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

eschnett commented Mar 8, 2016

StefanKarpinski commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

eschnett commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

simonbyrne commented Mar 8, 2016

ArchRobison commented Mar 8, 2016

simonbyrne commented Mar 9, 2016

ViralBShah commented Dec 17, 2020

vchuravy commented Dec 17, 2020

oscardssmith commented Dec 17, 2020

chriselrod commented Dec 17, 2020

Allow `@simd` to use vectorised math operations when available #15265

Allow `@simd` to use vectorised math operations when available #15265