Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompilation in packages affects performance of unrelated base methods on 0.5 #20369

Closed
joshbode opened this issue Feb 1, 2017 · 5 comments
Labels
compiler:precompilation Precompilation of modules packages Package management and loading

Comments

@joshbode
Copy link
Contributor

joshbode commented Feb 1, 2017

The performance and type-stability of base array operations (such as .+=) in Julia 0.5 is being consistently affected when the JuMP package has been imported.

This seems to be related to precompilation, as the effect can be negated by commenting out the apparently unrelated Base.precompile calls in JuMP (see precompile.jl).

Other linear algebra operations do not seem to be affected, though interestingly the first timing pass is slightly affected in terms of the allocations by importing JuMP.

Timing Comparison

Without JuMP

Pass 1:
  .+=:         0.209429 seconds (101.78 k allocations: 4.458 MB)
  .+:          0.020492 seconds (10.06 k allocations: 483.457 KB)
  A_mul_B!:    0.292780 seconds (122.64 k allocations: 5.086 MB)
  A_mul_Bt!:   1.136774 seconds (309.81 k allocations: 9.930 MB, 7.85% gc time)
  At_mul_B!:   0.048853 seconds (257 allocations: 14.641 KB)
  axpy!:       0.015024 seconds (6.32 k allocations: 271.873 KB)

Pass 2:
  .+=:         0.000022 seconds (4 allocations: 39.188 KB)
  .+:          0.000014 seconds (2 allocations: 39.141 KB)
  A_mul_B!:    0.045789 seconds
  A_mul_Bt!:   0.266610 seconds (6 allocations: 336 bytes)
  At_mul_B!:   0.045902 seconds
  axpy!:       0.000028 seconds

With JuMP

Pass 1:
  .+=:         0.293841 seconds (117.45 k allocations: 5.235 MB)
  .+:          0.023269 seconds (10.12 k allocations: 485.957 KB)
  A_mul_B!:    0.406249 seconds (125.40 k allocations: 5.187 MB, 26.05% gc time)
  A_mul_Bt!:   1.049989 seconds (312.00 k allocations: 10.014 MB)
  At_mul_B!:   0.052542 seconds (257 allocations: 14.641 KB)
  axpy!:       0.014825 seconds (6.38 k allocations: 274.451 KB)

Pass 2:
  .+=:         0.000186 seconds (24 allocations: 39.516 KB)
  .+:          0.000016 seconds (2 allocations: 39.141 KB)
  A_mul_B!:    0.049337 seconds
  A_mul_Bt!:   0.266507 seconds (6 allocations: 336 bytes)
  At_mul_B!:   0.040117 seconds
  axpy!:       0.000023 seconds

Types

Without JuMP

julia> x = rand(10);
julia> @code_warntype x .+= 1.0
Variables:
  #self#::Base.Broadcast.#broadcast!
  #unused#::Base.#identity
  x::Array{Float64,1}
  y::Array{Float64,1}

Body:
  begin
      $(Expr(:invoke, LambdaInfo for check_broadcast_shape(::Tuple{Int64}, ::Tuple{Int64}), :(Base.Broadcast.check_broadcast_shape), :((Core.tuple)((Base.arraysize)(x,1)::Int64)::Tuple{Int64}), :((Core.tuple)((Base.arraysize)(y,1)::Int64)::Tuple{Int64}))) # line 24:
      return $(Expr(:invoke, LambdaInfo for copy!(::Array{Float64,1}, ::Int64, ::Array{Float64,1}, ::Int64, ::Int64), :(Base.copy!), :(x), 1, :(y), 1, :((Base.arraylen)(y)::Int64)))
  end::Array{Float64,1}

With JuMP

julia> import JuMP
julia> x = rand(10);
julia> @code_warntype x .+= 1.0
Variables:
  #self#::Base.Broadcast.#broadcast!
  #unused#::Base.#identity
  x::Array{Float64,1}
  y::Array{Float64,1}

Body:
  begin
      $(Expr(:invoke, LambdaInfo for check_broadcast_shape(::Tuple{Int64}, ::Tuple{Int64}), :(Base.Broadcast.check_broadcast_shape), :((Core.tuple)((Base.arraysize)(x,1)::Int64)::Tuple{Int64}), :((Core.tuple)((Base.arraysize)(y,1)::Int64)::Tuple{Int64}))) # line 24:
      return $(Expr(:invoke, LambdaInfo for copy!(::Array{Float64,1}, ::Int64, ::Array{Float64,1}, ::Int64, ::Int64), :(Base.copy!), :(x), 1, :(y), 1, :((Base.arraylen)(y)::Int64)))
  end::Any

Test Script

#! /usr/bin/env julia

if ARGS[1] == "1"
    println("With JuMP")
    import JuMP
else
    println("Without JuMP")
end

n = 5000

srand(1)
A = rand(n, n)
B = rand(n)
A_ = A'
B_ = B'

for i = 1:2
    Z = zeros(n)
    println()
    println("Pass $i:")
    print("  .+=:       "); @time Z .+= B;
    print("  .+:        "); @time Z .+ 1.0;
    print("  A_mul_B!:  "); @time LinAlg.A_mul_B!(Z, A, B);
    print("  A_mul_Bt!: "); @time LinAlg.A_mul_Bt!(Z, A, B_);
    print("  At_mul_B!: "); @time LinAlg.At_mul_B!(Z, A_, B);
    print("  axpy!:     "); @time LinAlg.axpy!(2.0, B, Z);
end
@yuyichao
Copy link
Contributor

yuyichao commented Feb 1, 2017

Likely dup of #18465

@yuyichao
Copy link
Contributor

yuyichao commented Feb 1, 2017

#18869 is on 0.5 so you can try if the current release-0.5 branch has this so you should try if it fixes the problem.

@yuyichao yuyichao changed the title Precompilation in packages affects performance of unrelated base methods Precompilation in packages affects performance of unrelated base methods on 0.5 Feb 1, 2017
@joshbode
Copy link
Contributor Author

joshbode commented Feb 1, 2017

Will take a look - thank you!

@ararslan ararslan added packages Package management and loading compiler:precompilation Precompilation of modules labels Feb 1, 2017
@joshbode
Copy link
Contributor Author

joshbode commented Feb 1, 2017

With release-0.5 (6a1e339), whether JuMP is loaded or not:

  • @code_typed x .+= 1 is consistent (and correctly typed)
  • @time x .+= 1 has consistent timing and allocation profile.

I'll close this issue. Thanks again :)

cc: @JockLawrie

@joshbode joshbode closed this as completed Feb 1, 2017
@joshbode
Copy link
Contributor Author

joshbode commented Feb 1, 2017

Simple workaround on 0.5 - perform a vectorised assignment operation before loading JuMP, e.g.

Float64[] .+= 1  # generate code upfront
using JuMP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules packages Package management and loading
Projects
None yet
Development

No branches or pull requests

3 participants