Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For Loop Slowdown #11787

Closed
JaredCrean2 opened this issue Jun 20, 2015 · 14 comments
Closed

For Loop Slowdown #11787

JaredCrean2 opened this issue Jun 20, 2015 · 14 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@JaredCrean2
Copy link
Contributor

I was testing a few different ways to calculate a quantity over an array, and noticed for loops slowed down significantly from commit 317a4d1* (16 days old master). @timholy suggested I file an issue here.
The test code is here: https://github.com/JaredCrean2/julia_tests/tree/master/array_speed2. It does the calculating using a manually inlined double for loop, ArrayViews, and SubArrays.

Using Julia Version 0.4.0-dev+5149 (2015-06-01 18:58 UTC) Commit 317a4d1* (16 days old master)

 592.426 milliseconds (14232 allocations: 586 KB)
double loop @time printed above
 495.138 milliseconds (14422 allocations: 599 KB)
ArrayView @time printed above
  15.753 seconds      (300 M allocations: 8240 MB, 3.39% gc time)
slice @time printed above

Using Julia Version 0.4.0-dev+5439 (2015-06-18 13:48 UTC) Commit a23caa0* (0 days old master):

   1.720 seconds      (102 k allocations: 3600 KB)
double loop @time printed above
 566.122 milliseconds (16237 allocations: 692 KB)
ArrayView @time printed above
   4.137 seconds      (240 M allocations: 8240 MB, 12.96% gc time)
slice @time printed above

I looked at @code_warntye getEulerFlux(q,F), and the a23caa0 produced a huge amount of code (>1100 lines) compared to the 317a4d1 (~70 lines), and included many lines that looked like

(top(throw_boundserror))(q::Array{Float64,3},GenSym(45))::Union{}

that were not present in 317a4d1.

@yuyichao
Copy link
Contributor

@JaredCrean2 needs to be followed by a space or newline. Otherwise the text following the openning will be interpreted as language name.

@JaredCrean2
Copy link
Contributor Author

Right, thanks for the correction.

@mbauman
Copy link
Member

mbauman commented Jun 20, 2015

Very interesting. 16 days ago puts it after #10525, but before we fixed the inlining issues that caused lots of allocations Edit: it was before #10525. I can reproduce and I'll try to see if I can find the root issue. Thanks!

@JaredCrean2
Copy link
Contributor Author

Also note, that data was collected on 6/18 (I didn't get around to creating this issue until today, 6/20), so those times should be 18 and 2 days old, respectively.

@mbauman
Copy link
Member

mbauman commented Jun 20, 2015

That's not a problem at all. Git makes it easy to "go back in time" and see how performance has changed. I have a bisect running right now to narrow down what might have caused this — the information you gave was very valuable!

@JaredCrean2
Copy link
Contributor Author

Thanks, I'm glad I could help.

@JeffBezanson
Copy link
Member

Thanks for reporting this.

Do I read correctly that the third test case got faster?

@JaredCrean2
Copy link
Contributor Author

Yes indeed, a factor of 4x. @timholy indicated in ArrayViews Issue #29 he expected slices to be on par with ArrayViews performance. It is certainly better, but not as good at ArrayViews yet.

@mbauman
Copy link
Member

mbauman commented Jun 20, 2015

Fortunately the performance gain and the regression aren't coupled together. I imagine that the speed up for slice was a result of the work in #11595.

Bisect is identifying dadf243 from #11644 as the first bad commit (cc @vtjnash).

@timholy
Copy link
Member

timholy commented Jun 20, 2015

AFAICT, it's more that construction of ArrayViews got better with recent julia---at least in my tests, there is no allocation whatsoever with ArrayViews. At one point the two were comparable: #8501 (comment).

However, I'm not sure this is the only thing going on: since those benchmarks there have been several changes to SubArrays (#9329, #10296, #10505, #11296, #10525, and #11595), and I just haven't been able to find the time (and probably won't soon) to dig into this more than superficially.

@timholy
Copy link
Member

timholy commented Jun 20, 2015

Nice detective work, @mbauman.

@JeffBezanson JeffBezanson added performance Must go faster regression Regression in behavior compared to a previous version labels Jun 21, 2015
@mbauman
Copy link
Member

mbauman commented Jun 24, 2015

I'm trying to dig into this one a bit more. With my latest performance fixes (#11833 + #11843), we're much improved (I've duplicated the @time lines to separate out compilation time):

$ ./julia tmp/arrayspeed/main.jl
   1.138 seconds      (22783 allocations: 948 KB)
 527.508 milliseconds (4 allocations: 144 bytes)
double loop @time printed above
 600.666 milliseconds (38441 allocations: 1635 KB)
 560.828 milliseconds (4 allocations: 144 bytes)
ArrayView @time printed above
   3.274 seconds      (120 M allocations: 4581 MB, 8.19% gc time)
   3.116 seconds      (120 M allocations: 4578 MB, 8.66% gc time)
slice @time printed above

Now the question is if the regression that was introduced in dadf243 was coupled to this or not…

@mbauman
Copy link
Member

mbauman commented Jun 28, 2015

Hurrah, this seems to be solved.

Current master:

$ ./julia tmp/arrayspeed/main.jl
   1.342 seconds      (21263 allocations: 876 KB)
 592.298 milliseconds (4 allocations: 144 bytes)
double loop @time printed above
 600.586 milliseconds (38426 allocations: 1635 KB)
 562.081 milliseconds (4 allocations: 144 bytes)
ArrayView @time printed above
   3.253 seconds      (120 M allocations: 4581 MB, 7.99% gc time)
   3.103 seconds      (120 M allocations: 4578 MB, 8.53% gc time)
slice @time printed above

Julia @ dadf243

$ ~/Applications/julia-0.4-pre-alloca/bin/julia tmp/arrayspeed/main.jl
   1.294 seconds      (122 k allocations: 4513 KB)
 626.594 milliseconds (4 allocations: 144 bytes)
double loop @time printed above
 755.303 milliseconds (38880 allocations: 1659 KB)
 639.865 milliseconds (4 allocations: 144 bytes)
ArrayView @time printed above
   4.769 seconds      (240 M allocations: 8243 MB, 9.22% gc time)
   4.206 seconds      (240 M allocations: 8240 MB, 9.57% gc time)
slice @time printed above

I even went back and added my performance fixes to dadf243 just to make sure that I wasn't independently "covering up" the perf regression with independent fixes. But that had little-to-no effect on the total performance. It may be the case that there are still some non-optimal corner-cases in user (or other base) code that are pessimized here, but without a solid example I think this can be closed.

@mbauman mbauman closed this as completed Jun 28, 2015
@StefanKarpinski
Copy link
Member

Excellent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

6 participants