Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huge performance regression in vector math #17794

Closed
stevengj opened this issue Aug 3, 2016 · 20 comments
Closed

huge performance regression in vector math #17794

stevengj opened this issue Aug 3, 2016 · 20 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Milestone

Comments

@stevengj
Copy link
Member

stevengj commented Aug 3, 2016

There has been a huge performance regression in simple vector-math operations like +, e.g.

x = rand(10^7); y = rand(10^7);
@time x + y;
@time x + y;

gives 0.495843 seconds (20.00 M allocations: 381.463 MB, 20.92% gc time) ... notice the 20M allocations, indicative of a type instability in an inner loop.

The x + y call devolves into a call to Base._elementwise(+, Float64, x, y) in arraymath.jl, which was most recently touched by #17389 (@pabloferz) and #17313 (@martinholters).

Since @nanosoldier didn't detect any performance regressions in #17313, I'm guessing #17389 is the problem here?

@stevengj stevengj added performance Must go faster regression Regression in behavior compared to a previous version labels Aug 3, 2016
@stevengj stevengj added this to the 0.5.0 milestone Aug 3, 2016
@ufechner7
Copy link

I cannot reproduce this problem, comparing Version 0.4.5 (2016-03-18 00:58 UTC)
and Version 0.5.0-rc0+0 (2016-07-26 20:22 UTC) on Linux (Ubuntu 16.04).
What did you compare?

@yuyichao
Copy link
Contributor

yuyichao commented Aug 3, 2016

0.5.0-rc0+174

@pabloferz
Copy link
Contributor

Adding the type parameter seems to fix it, I was trying to take some type parameters out, but I took out too much.

function Base._elementwise{T}(op, ::Type{T}, A::AbstractArray, B::AbstractArray)
    F = similar(A, T, promote_shape(A, B))
    for (iF, iA, iB) in zip(eachindex(F), eachindex(A), eachindex(B))
        @inbounds F[iF] = op(A[iA], B[iB])
    end
    return F
end

@JeffBezanson
Copy link
Member

Yes, the Type{T} ones tend to be important, since we try to avoid specializing on every type argument.

@tkelman
Copy link
Contributor

tkelman commented Aug 3, 2016

yikes, we really should have run nanosoldier there before merging. whoops.

@pabloferz
Copy link
Contributor

#17798

@eschnett
Copy link
Contributor

eschnett commented Aug 4, 2016

@JeffBezanson Is there documentation / a rough design document / a blog entry / a certain set of functions that one could look at to get a feeling for the rules that govern specialization?

@tkelman tkelman closed this as completed in e68bc89 Aug 4, 2016
@tkelman
Copy link
Contributor

tkelman commented Aug 4, 2016

While #17798 helped, it didn't fix all of it. https://github.com/JuliaCI/BaseBenchmarkReports/blob/f42bed6fb5e9d16970da9b58cf24755de6dc7d0f/daily_2016_8_4/report.md I think what I'm going to do is revert #17389 on the release-0.5 branch for rc1.

@tkelman tkelman reopened this Aug 4, 2016
@pabloferz
Copy link
Contributor

pabloferz commented Aug 4, 2016

I think that for the rest of the problems another type parameter (again) does the trick

function promote_op{S}(f, ::Type{S})
    T = _promote_op(f, _default_type(S))
    return isleaftype(S) ? T : typejoin(S, T)
end
function promote_op{R,S}(f, ::Type{R}, ::Type{S})
    T = _promote_op(f, _default_type(R), _default_type(S))
    isleaftype(R) && return isleaftype(S) ? T : typejoin(S, T)
    return isleaftype(S) ? typejoin(R, T) : typejoin(R, S, T)
end

(currently these two do not have type parameters).

So, echoing @eschnett, a guideline here would be helpful (maybe even worth a place on the performance tips).

@KristofferC
Copy link
Member

My mental model has been that function arguments are only for dispatch and has nothing to do with performance (except for ANY). That is apparently wrong so I would also be interested in that.

@pabloferz
Copy link
Contributor

As far as I can see the also seem to matter when dispatching on Type{T}s, for the rest it seems that your mental model (which is the one endorsed) still works.

@KristofferC
Copy link
Member

Alright, good to know. Thanks.

@stevengj
Copy link
Member Author

stevengj commented Aug 4, 2016

@pabloferz, my understanding is that if you write a function f(T::Type), then T is just a value that is determined at runtime (the same version of f is compiled for all values of T), whereas if you write f{T}(::Type{T}) then T is part of the type signature of the function — hence it is known at compile time and a specialized version of f is compiled for every T.

@pabloferz
Copy link
Contributor

pabloferz commented Aug 4, 2016

@stevengj That was my understanding too. But, I forgot for a moment while writing the changes on #17389.

Now I believe that a bad interaction between the object_id hashing, cached typed changes and using inference to try to find the return type (and some missing static type parameters) is causing most of these problems. But I'm not actually sure.

@JeffBezanson
Copy link
Member

@pabloferz Could we have a PR with the more complete fix?

@pabloferz
Copy link
Contributor

I can work on that, but I won't be able to get to it until Tuesday.

@tkelman
Copy link
Contributor

tkelman commented Aug 5, 2016

in that case we'll probably have to put rc2 out without reinstating #17389

@JeffBezanson
Copy link
Member

IIUC, this performance regression no longer exists on the release branch, so this is not blocking anymore.

@JeffBezanson JeffBezanson modified the milestones: 0.5.x, 0.5.0 Aug 5, 2016
@tkelman
Copy link
Contributor

tkelman commented Aug 12, 2016

I think this is fixed by #17929. Reopen or leave a comment if you think otherwise.

@tkelman tkelman closed this as completed Aug 12, 2016
@stevengj
Copy link
Member Author

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

8 participants