Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorization depends on silly details #16894

Closed
timholy opened this issue Jun 12, 2016 · 5 comments · Fixed by #16897
Closed

Vectorization depends on silly details #16894

timholy opened this issue Jun 12, 2016 · 5 comments · Fixed by #16897
Labels
upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@timholy
Copy link
Member

timholy commented Jun 12, 2016

Consider the following functions:

function mysum_simd1(v)
    s = zero(eltype(v))
    @simd for i in eachindex(v)
        @inbounds s += v[i]
    end
    s
end
function mysum_simd2(v)
    s = zero(eltype(v))
    @simd for i in eachindex(v)
        @inbounds val = v[i]
        s += val
    end
    s
end

By any reasonable standard, these should be indistinguishable: all I did was split one line into two, storing the result in a temporary variable. However,

v = rand(1000)
s = slice(v, 1:999)
using BenchmarkTools
println("simd1, Vector: ", time(@benchmark mysum_simd1(v)))
println("simd2, Vector: ", time(@benchmark mysum_simd2(v)))
println("simd1, SubArray: ", time(@benchmark mysum_simd1(s)))
println("simd2, SubArray: ", time(@benchmark mysum_simd2(s)))

yields the following times:

simd1, Vector: 144.0
simd2, Vector: 144.0
simd1, SubArray: 155.0
simd2, SubArray: 1020.0

(measured in ns). Naturally, @code_llvm reveals that mysum_simd1(s) is vectorized, and mysum_simd2(s) is not.

@yuyichao
Copy link
Contributor

This is #15402 AFACT

@yuyichao
Copy link
Contributor

There seems to be multiple issues here.

  1. Due to Unnecessary GC root for getfield of SSA immutable object #15402, we create a GC frame for the load of the actual array from the subarray.
  2. Somehow, one of the llvm passes lost our jltbaa_arraybuf annotation on the array load.
  3. Somehow llvm doesn't want to hoist the GC frame store out of the loop without the tbaa on the array load even though these are the only two memory ops in the loop and the store has a constant address and value and is always run before the load.

@timholy
Copy link
Member Author

timholy commented Jun 12, 2016

Thanks a bunch for digging into this, @yuyichao.

@yuyichao yuyichao added the upstream The issue is with an upstream dependency, e.g. LLVM label Jun 12, 2016
@yuyichao
Copy link
Contributor

  1. Unnecessary GC root for getfield of SSA immutable object #15402 is ....., well, ......., Unnecessary GC root for getfield of SSA immutable object #15402 and is now tracked at Very inefficient GC frame generation #15369 (comment)
  2. @carnaval and I traced down the lost of tbaa node and I've submitted a patch at http://reviews.llvm.org/D21271
  3. LLVM unable to hoist store seems to be a general limitation and is said to be not a common code pattern according to the reply at my llvm bug report https://llvm.org/bugs/show_bug.cgi?id=28097 . With the patch above this shouldn't be too much an issue.

I'll check if applying this patch on llvm3.8/3.7 fixes the issue and will submit a pr here if it works....

@yuyichao
Copy link
Contributor

The "silly detail" is basically the order llvm does the optimization or sth similar. The test cast in the LLVM review actually doesn't trigger the issue if running with opt -O2, it only triggers when this pattern is seen by InstCombine.....

earl pushed a commit to earl/llvm-mirror that referenced this issue Jun 16, 2016
The original check for load CSE or store-to-load forwarding is wrong
when the forwarded stored value happened to be a load.

Ref JuliaLang/julia#16894

Differential Revision: http://reviews.llvm.org/D21271

Patch by Yichao Yu!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272868 91177308-0d34-0410-b5e6-96231b3b80d8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants