-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing performance #11595
Comments
Anyone know how to do an automated git bisect of a rebase? Perhaps using git rerere to automate the conflict resolution? |
Create a test branch with your commit reordered early into the history, back when you originally proposed it? Effectively this would rebase everything since on top of #10525, then you can bisect that branch. |
unless i made a math error, that commit is strictly switching from floating point math to scaled integer math (with a scale factor of 1000), and should not have significantly changed the thresholds. |
Ahh. Github doesn't automatically expand the diff.... Let's try again. Line 2573 in 6b94780
|
@yuyichao i'm not sure what you are pointing at on that line, since it doesn't deal with inlining costs nor did it change in the bisect commit mentioned above |
That is the only call to if ((occ==0 && is(aeitype,Bottom)) || islocal || (occ > 1 && !inline_worthy(aei, occ*2)) || Edit: In another word diff --git a/base/inference.jl b/base/inference.jl
index f03d3aa..d6f3a81 100644
--- a/base/inference.jl
+++ b/base/inference.jl
@@ -2570,7 +2570,7 @@ function inlineable(f::ANY, e::Expr, atype::ANY, sv::StaticVarInfo, enclosing_as
end
end
free = effect_free(aei,sv,true)
- if ((occ==0 && is(aeitype,Bottom)) || islocal || (occ > 1 && !inline_worthy(aei, occ*2)) ||
+ if ((occ==0 && is(aeitype,Bottom)) || islocal || (occ > 1 && !inline_worthy(aei, occ*2000)) ||
(affect_free && !free) || (!affect_free && !effect_free(aei,sv,false)))
if occ != 0 # islocal=true is implied by occ!=0
if !islocal |
oops, good catch -- yes, i didn't even see that one |
Ah, wonderful. Thanks guys. That helps a lot. It's still not 100%, but definitely getting closer. On abstractsmarts10-updated + the above patch: julia,sumeltIs ArrayLS,12.441170,18.845323,14.019387,1.576552
julia,sumeachIs ArrayLS,9.144515,15.143982,10.485381,1.488682
julia,sumlinearIs ArrayLS,28.358809,37.302391,33.664462,2.070900 Inlining is crazy difficult to get right. |
The memory consumption reported in #5117 (comment) is also consistent with a splatting penalty. |
One way to avoid the splatting penalty after #10380 is to use tuple as argument directly instead of using it as vararg argument. (See script and output below). I'm not sure if I would recommand this as a long term solution though because IMHO the vararg case will be as fast (#11248) or even better (#5402) than the tuple case in the future. #!/usr/bin/julia -f
using Base: unsafe_getindex, checkbounds
macro time_func(f, args...)
args = eval(current_module(), Expr(:tuple, args...))::Tuple
argnames = Symbol[gensym() for i in 1:length(args)]
types = map(typeof, args)
quote
function wrapper($(argnames...))
$(Expr(:meta, :noinline))
$f($(argnames...))
end
function timing_wrapper()
println($f, $types)
wrapper($(args...))
gc()
@time for i in 1:10000000
wrapper($(args...))
end
# Profile.print()
gc()
end
timing_wrapper()
end
end
function checkbounds2{T<:Union(Real,AbstractArray,Colon)}(A::AbstractArray,
I::Tuple{Vararg{T}})
Base.@_inline_meta
n = length(I)
if n > 0
for dim = 1:(n-1)
(Base._checkbounds(size(A,dim), I[dim]) ||
Base.throw_boundserror(A, I))
end
(Base._checkbounds(Base.trailingsize(A,n), I[n]) ||
Base.throw_boundserror(A, I))
end
end
@inline function getindex2(V::SubArray, I::Int...)
checkbounds2(V, I)
unsafe_getindex(V, I...)
end
function checkbounds3{T<:Union(Real,AbstractArray,Colon)}(A::AbstractArray,
I::T...)
Base.@_inline_meta
n = length(I)
if n > 0
for dim = 1:(n-1)
(Base._checkbounds(size(A,dim), I[dim]) ||
Base.throw_boundserror(A, I))
end
(Base._checkbounds(Base.trailingsize(A,n), I[n]) ||
Base.throw_boundserror(A, I))
end
end
@inline function getindex3(V::SubArray, I::Int...)
checkbounds3(V, I...)
unsafe_getindex(V, I...)
end
s1 = sub(Float64[1, 2], 1:2)
# @code_llvm getindex2(s1, 1)
# @code_llvm getindex(s1, 1)
@time_func(getindex3, s1, 1)
@time_func(getindex2, s1, 1)
@time_func(getindex, s1, 1) getindex3(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
1.560 seconds (10000 k allocations: 153 MB, 1.13% gc time)
getindex2(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
88.030 milliseconds
getindex(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
1.473 seconds (10000 k allocations: 153 MB, 0.22% gc time) |
I've dug into this more, applying @yuyichao's patch and also editing julia> s1 = sub(Float64[1, 2], 1:2)
2-element SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1}:
1.0
2.0
julia> f(s) = s[1]
f (generic function with 1 method)
julia> @code_warntype f(s1)
Variables:
s::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1}
Body:
begin # none, line 1:
checkbounds(s::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},1)::Void
return (top(arrayref))((top(getfield))(s::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},:parent)::Array{Float64,1},(top(box))(Int64,(top(sub_int))((top(box))(Int64,(top(add_int))((top(getfield))(s::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},:first_index)::Int64,1)),1)))::Float64
end::Float64 No more type-instability, of course. But notice the call to Using HackThatBase, I discovered the reason: something, somewhere, is inserting a linfo = AST(:($(Expr(:lambda, Any[:A,:(I::Union(Real,AbstractArray,Colon)...)], Any[Any[:n,symbol("#s17"),symbol("#s19"),:dim,symbol("#s18")],Any[Any[:A,:Any,0],Any[:I,:Any,0],Any[:n,:Any,18],Any[symbol("#s17"),:Any,18],Any[symbol("#s19"),:Any,2],Any[:dim,:Any,18],Any[symbol("#s18"),:Any,18]],Any[],2], :(begin # abstractarray.jl, line 162:
NewvarNode(symbol("#s17"))
$(Expr(:meta, :inline)) # line 163:
n = length(I) # line 164:
unless n > 0 goto 7 # line 165:
GenSym(0) = colon(1,n - 1)
#s19 = (top(start))(GenSym(0))
unless (top(!))((top(done))(GenSym(0),#s19)) goto 1
2:
GenSym(1) = (top(next))(GenSym(0),#s19)
dim = (top(getfield))(GenSym(1),1)
#s19 = (top(getfield))(GenSym(1),2) # line 166:
#s18 = _checkbounds(size(A,dim),getindex(I,dim))
unless #s18 goto 4
goto 5
4:
throw_boundserror(A,I)
5:
3:
unless (top(!))((top(!))((top(done))(GenSym(0),#s19))) goto 2
1:
0: # line 168:
#s17 = _checkbounds(trailingsize(A,n),getindex(I,n))
unless #s17 goto 6
goto 7
6:
throw_boundserror(A,I)
7: # line 170:
return nothing
end)))))
atypes = Tuple{MyArray{Float64,1},Int64}
(in typeinf at none, line 1294) Any ideas where this is happening, @vtjnash? |
Sounds like the same issue here #10980 (comment) - - .... |
I've also tried to manually inline the call and it appears to be slower than the tuple argument version here. |
With the following method added. full script @inline function getindex4(V::SubArray, I::Int...)
n = length(I)
if n > 0
for dim = 1:(n - 1)
(Base._checkbounds(size(V, dim), I[dim]) ||
Base.throw_boundserror(V, I))
end
(Base._checkbounds(Base.trailingsize(V, n), I[n]) ||
Base.throw_boundserror(V, I))
end
unsafe_getindex(V, I...)
end Output:
|
Once we solve the inlining problem, all will be golden. It worked fine just a short while ago. |
Ah, very nice. I'll run another bisect for that. |
Suspected ebfebfd, CC @JeffBezanson |
Yup, confirmed. (Bisects are so fast when there are only two commits to test!) |
Well, I tried playing around in the frontend to ensure meta always stays on top, but I am not a lisper. It'll have to wait for someone with better lisp chops than me. |
I think we'll have to be more robust about looking for metadata. Otherwise multiple features will compete for that top spot in the function body. |
This fixes the primary problem in #11595, but is not enough on its own to recapture full performance. Interestingly, this does not appear to cause a measurable slowing of the compilation of base.
This address most of the remaining problem in #11595, but it's still about three-fold short of the performance we had before getindex(::SubArray, ...) checked bounds.
Thank you everyone for helping find the issues here. While we still need to find better performance for SubArrays, I think this issue has served its purpose. We're now back to the timings I was expecting to see for #10525. 👍 |
I have to say, it's still a little unsatisfying that a week ago I had
(using the test in #5117 (comment)) and now I have
A 6-fold gap is big enough that I'd be reluctant to use SubArrays in real applications (and I have a lot of code that uses them now)---I'd prefer to just copy the data. I think we should turn bounds-checking off on SubArrays until we can recover something close to the original performance. |
(I think I've lost track of some discussion half way through so I'm sorry if this has been answered before) What has changed between the good (fast) commit and the bad (slow) commit in your (@timholy) comment above. Does #11617 and 56342be not fixing the problems introduced by ebfebfd and 677f537 respectively? Or is there another bad commit to be bisected or does the performance just gradually decreases over the past week(s)? |
It's the fact that #10525 turned on bounds-checking for SubArrays, which we've never had before. Bounds-checking is a Good Thing, but the performance hit is not. For some reason our bounds-checking on Array is not causing anything like the same performance hit. |
Ahh. I see. Make sense. Thanks. |
I definitely agree. I'll work on improving the situation this week, but my time is somewhat limited. |
No worries, it could be a hard problem, I frankly expect we'll have to leave it off for 0.4. |
This fixes the primary problem in JuliaLang#11595, but is not enough on its own to recapture full performance. Interestingly, this does not appear to cause a measurable slowing of the compilation of base.
This address most of the remaining problem in JuliaLang#11595, but it's still about three-fold short of the performance we had before getindex(::SubArray, ...) checked bounds.
On Julia just-before-10525, I'm seeing ~15ms for the last three tests. If I remove the On master, I get ~55ms, and |
Hm, I don't understand why Well, at a minimum this is a very easy change to restore the old performance. |
It's pretty ugly to rewrite |
Somewhere within a rebase during the month of May, I lost some of the performance I had been expecting to see in #10525. I had been tracking the output of
test/perf/array/perf.jl
over the life of the issue, and periodically creating new branches locally as I rebased to keep up with master. So I was able to go back in time to a few snapshots and look at some key metrics from the perf test. Here's a quick dump of what I found. I won't be able to spend more time on this today, but I will try to resolve this over the weekend.Sorry for the false alarm @timholy, the commit I flagged in #5117 was totally unrelated and simply due to some dirty files I had. It's something else.
The text was updated successfully, but these errors were encountered: