-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance regressions since 0.2 #6112
Comments
Some findings:
I also see slightly slower performance when starting with |
Another finding: in 0.2 llvm was able to optimize out |
restores the performance of stockcorr
restores the performance of stockcorr
most of these look pretty good now, comparing 0.2.1 to master (+ #6599 patch, tested at many inlining thresholds): these two look much worse (100% slower): these two look slightly worse (10% slower): the rest seem about the same (within experimental tolerance) or faster quicksort seems to have taken a 50% hit to speed recently. it looks like the only difference in code_typed is the order of lowering for the while loop conditions, which appears to be confusing llvm into emitting unnecessary copies of the conditional. @JeffBezanson? also, strangely, the performance of small and large on the following tests swapped: |
It still looks like we have significant performance regressions almost across the board of the test suite. For example, add1 is about 30% slower. |
Are you using a 0.3 binary (core2) on a newer processor? I did not see this in my testing last week |
Using a core2 binary would be a good explanation of this. I only see the slowdown on codespeed and not in manual testing anywhere else. |
I was just referring to codespeed results. |
In that case, I suspect it is using a core2 binary distribution build |
The SimplexBenchmarks also show some performance degradation between 0.2 and 0.3, at least on my machine. Both are built from source, not using a binary distribution.
|
Note: to get that output, I modified |
In PR #7177 I added performance tests for sparse getindex. I ran them for 0.2.1, for 0.3 in March (when getindex was virtually the same as in 0.2.1), and for a current Julia build: Performance decreased in many tests from 0.2.1 to 0.3, even though the getindex methods did not change. For instance performance of this function, essentially a binary search, decrease by 100% (see function getindex{T}(A::SparseMatrixCSC{T}, i0::Integer, i1::Integer)
if !(1 <= i0 <= A.m && 1 <= i1 <= A.n); error(BoundsError); end
first = A.colptr[i1]
last = A.colptr[i1+1]-1
while first <= last
mid = (first + last) >> 1
t = A.rowval[mid]
if t == i0
return A.nzval[mid]
elseif t > i0
last = mid - 1
else
first = mid + 1
end
end
return zero(T)
end |
Ouch. |
I did some fuzzy binary search of the passes, and was able to get significant improvements in many benchmarks by removing all of the |
I'm tempted to open an issue about using genetic programming to optimized optimization passes. I'm not generally a fan of GA, but this does seem like a particularly well-suited problem. |
What about simulated annealing instead? |
Would also quite possibly would be good for this. |
Seems to be a quasi-bug in LLVM, perhaps a quirk of the legacy JIT's native code generator. Doing a CFG simp pass at the end seriously screws up code gen, which doesn't seem like it should happen. @Keno would be interesting to check MCJIT. |
Not sure what exactly you're looking at, but you can easily try it yourself with a second copy of julia. Just set LLVM_VER to |
I re-ran the performance tests with MCJIT for the 0.3-March version. No improvement, no difference in fact. File v0.3-cc307ea-March-MCJIT added to: |
Thanks. Could you try with my latest change? |
I added the run with the latest source to: The binary search is now slightly faster than ever before. However, there are still quite a few performance regression in those test of up to ~25%. @JeffBezanson: if you are interested, I can look into it more closely tomorrow to narrow it down to some specific functions. |
We might have to live with a few 25% regressions instead of the 100% regression. |
We should keep track of all regressions once 0.3 is released so that they can be improvement targets in 0.4 where we will probably move to MCJIT. |
@mauro3 It would be great to narrow down the cause of performance loss and have a short test just for the record here. |
Julia sure is a moving target: I isolated one of the offenders only to find out that those performance regressions have been fixed over the last 24h. (here the test if anyone is interested: https://gist.github.com/mauro3/4274870c64c38aeeb722) The other one I found (still there) is due to |
That is a relief. One of the reasons I first wrote the sparse matrix support was to push the compiler, and it continues to be so. |
Any comment on the performance degradation on the simplex benchmarks? |
@JeffBezanson 's comments suggest that some benchmarks have improved. Are the simplex benchmarks still slower? |
I see a slight improvement, but definitely not back to 0.2 levels. Probably not worth holding up the release for this, though. |
Is there anything specific keeping this open? Maybe we can open a specific simplex regression issue if that's something we want to take another stab at in the future. |
There is still a small regression in |
In the test I posted above (https://gist.github.com/mauro3/8745144b120763fbf225) which uses
|
Looks like we allocate much less memory though, so ... win? :) |
on my system, now even faster than before
fixed! |
Some of these are quite bad.
The text was updated successfully, but these errors were encountered: