Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package precompile is slow #15048

Closed
JeffBezanson opened this issue Feb 12, 2016 · 37 comments
Closed

package precompile is slow #15048

JeffBezanson opened this issue Feb 12, 2016 · 37 comments
Labels
compiler:latency Compiler latency compiler:precompilation Precompilation of modules

Comments

@JeffBezanson
Copy link
Member

Precompiling DataFrames takes ~47 seconds:

jeff@gurren:~/src/julia$ time ./julia -e 'using DataFrames'
INFO: Recompiling stale cache file /home/jeff/.julia/lib/v0.5/DataFrames.ji for module DataFrames.
INFO: Recompiling stale cache file /home/jeff/.julia/lib/v0.5/Reexport.ji for module Reexport.
INFO: Recompiling stale cache file /home/jeff/.julia/lib/v0.5/StatsBase.ji for module StatsBase.
INFO: Recompiling stale cache file /home/jeff/.julia/lib/v0.5/ArrayViews.ji for module ArrayViews.
WARNING: Method definition (::Type{ArrayViews.UnsafeContiguousView})(Ptr{#T<:Any}, Int64, NTuple{#N<:Any, Int64}) in module ArrayViews at /home/jeff/.julia/v0.5/ArrayViews/src/arrviews.jl:18 overwritten at /home/jeff/.julia/v0.5/ArrayViews/src/arrviews.jl:26.
INFO: Recompiling stale cache file /home/jeff/.julia/lib/v0.5/StatsFuns.ji for module StatsFuns.
INFO: Recompiling stale cache file /home/jeff/.julia/lib/v0.5/DataArrays.ji for module DataArrays.
WARNING: Method definition (::Type{DataArrays.FastPerm})(#O<:Base.Order.Ordering, #V<:AbstractArray{T<:Any, 1}) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/pooleddataarray.jl:598 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/pooleddataarray.jl:601.
WARNING: Method definition PooledDataVecs(DataArrays.PooledDataArray{#S<:Any, #R<:Integer, #N<:Any}, AbstractArray{#S<:Any, #N<:Any}) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/pooleddataarray.jl:645 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/pooleddataarray.jl:650.
WARNING: Method definition map!(Union{Function, DataType}, DataArrays.DataArray, Base.Range) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/broadcast.jl:211 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/broadcast.jl:211.
WARNING: Method definition map!(Union{Function, DataType}, DataArrays.PooledDataArray, Base.Range) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/broadcast.jl:211 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/broadcast.jl:211.
WARNING: Method definition rep(AbstractArray{T<:Any, 1}) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/extras.jl:83 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/extras.jl:98.
INFO: Recompiling stale cache file /home/jeff/.julia/lib/v0.5/GZip.ji for module GZip.
WARNING: Base.FileOffset is deprecated, use Int64 instead.
  likely near /home/jeff/.julia/v0.5/GZip/src/zlib_h.jl:86
WARNING: Base.FileOffset is deprecated, use Int64 instead.
  likely near /home/jeff/.julia/v0.5/GZip/src/zlib_h.jl:88
WARNING: Base.FileOffset is deprecated, use Int64 instead.
  likely near /home/jeff/.julia/v0.5/GZip/src/zlib_h.jl:88
WARNING: New definition 
    write(GZip.GZipStream, Array{#T<:Any, N<:Any}) at /home/jeff/.julia/v0.5/GZip/src/GZip.jl:456
is ambiguous with: 
    write(Base.IO, Array{UInt8, N<:Any}) at io.jl:154.
To fix, define 
    write(GZip.GZipStream, Array{UInt8, N<:Any})
before the new definition.
WARNING: New definition 
    write(GZip.GZipStream, Array{#T<:Any, N<:Any}) at /home/jeff/.julia/v0.5/GZip/src/GZip.jl:456
is ambiguous with: 
    write(Base.IO, Array{UInt8, N<:Any}) at io.jl:154.
To fix, define 
    write(GZip.GZipStream, Array{UInt8, N<:Any})
before the new definition.
INFO: Recompiling stale cache file /home/jeff/.julia/lib/v0.5/SortingAlgorithms.ji for module SortingAlgorithms.
INFO: Recompiling stale cache file /home/jeff/.julia/lib/v0.5/Docile.ji for module Docile.
WARNING: Base.LambdaStaticData is deprecated, use LambdaInfo instead.
  likely near /home/jeff/.julia/v0.5/Docile/src/Utilities.jl:122
WARNING: Method definition unstack(DataFrames.AbstractDataFrame, Int64, Int64) in module DataFrames at /home/jeff/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:201 overwritten at /home/jeff/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:216.
WARNING: New definition 
    write(GZip.GZipStream, Array{#T<:Any, N<:Any}) at /home/jeff/.julia/v0.5/GZip/src/GZip.jl:456
is ambiguous with: 
    write(Base.IO, Array{UInt8, N<:Any}) at io.jl:154.
To fix, define 
    write(GZip.GZipStream, Array{UInt8, N<:Any})
before the new definition.

real    0m47.418s
user    0m47.248s
sys 0m6.021s

However if I just load it without precompilation, it takes only ~12 seconds:

jeff@gurren:~/src/julia$ time ./julia --compilecache=no -e 'using DataFrames'
WARNING: Method definition get_splatinfo(Expr, Symbol) in module CompatCartesian at /home/jeff/.julia/v0.5/Compat/src/ngenerate.jl:66 overwritten at /home/jeff/.julia/v0.5/Compat/src/ngenerate.jl:123.
WARNING: Method definition isfuncexpr(Expr) in module CompatCartesian at /home/jeff/.julia/v0.5/Compat/src/ngenerate.jl:106 overwritten at /home/jeff/.julia/v0.5/Compat/src/ngenerate.jl:140.
WARNING: Method definition isfuncexpr(Any) in module CompatCartesian at /home/jeff/.julia/v0.5/Compat/src/ngenerate.jl:108 overwritten at /home/jeff/.julia/v0.5/Compat/src/ngenerate.jl:142.
WARNING: Method definition (::Type{ArrayViews.UnsafeContiguousView})(Ptr{#T<:Any}, Int64, NTuple{#N<:Any, Int64}) in module ArrayViews at /home/jeff/.julia/v0.5/ArrayViews/src/arrviews.jl:18 overwritten at /home/jeff/.julia/v0.5/ArrayViews/src/arrviews.jl:26.
WARNING: Method definition (::Type{DataArrays.FastPerm})(#O<:Base.Order.Ordering, #V<:AbstractArray{T<:Any, 1}) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/pooleddataarray.jl:598 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/pooleddataarray.jl:601.
WARNING: Method definition PooledDataVecs(DataArrays.PooledDataArray{#S<:Any, #R<:Integer, #N<:Any}, AbstractArray{#S<:Any, #N<:Any}) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/pooleddataarray.jl:645 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/pooleddataarray.jl:650.

WARNING: deprecated syntax "call(f::MapReduceDim2ArgHelperFun, ...)".
Use "(f::MapReduceDim2ArgHelperFun)(...)" instead.

WARNING: deprecated syntax "call(f::Abs2MinusFun, ...)".
Use "(f::Abs2MinusFun)(...)" instead.
WARNING: Method definition map!(Union{Function, DataType}, DataArrays.DataArray, Base.Range) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/broadcast.jl:211 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/broadcast.jl:211.
WARNING: Method definition map!(Union{Function, DataType}, DataArrays.PooledDataArray, Base.Range) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/broadcast.jl:211 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/broadcast.jl:211.
WARNING: Method definition rep(AbstractArray{T<:Any, 1}) in module DataArrays at /home/jeff/.julia/v0.5/DataArrays/src/extras.jl:83 overwritten at /home/jeff/.julia/v0.5/DataArrays/src/extras.jl:98.
WARNING: Base.FileOffset is deprecated, use Int64 instead.
  likely near /home/jeff/.julia/v0.5/GZip/src/zlib_h.jl:86
WARNING: Base.FileOffset is deprecated, use Int64 instead.
  likely near /home/jeff/.julia/v0.5/GZip/src/zlib_h.jl:88
WARNING: Base.FileOffset is deprecated, use Int64 instead.
  likely near /home/jeff/.julia/v0.5/GZip/src/zlib_h.jl:88
WARNING: New definition 
    write(GZip.GZipStream, Array{#T<:Any, N<:Any}) at /home/jeff/.julia/v0.5/GZip/src/GZip.jl:456
is ambiguous with: 
    write(Base.IO, Array{UInt8, N<:Any}) at io.jl:154.
To fix, define 
    write(GZip.GZipStream, Array{UInt8, N<:Any})
before the new definition.
WARNING: Base.LambdaStaticData is deprecated, use LambdaInfo instead.
  likely near /home/jeff/.julia/v0.5/Docile/src/Utilities.jl:122
WARNING: Method definition unstack(DataFrames.AbstractDataFrame, Int64, Int64) in module DataFrames at /home/jeff/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:201 overwritten at /home/jeff/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:216.

real    0m11.925s
user    0m11.466s
sys 0m0.792s
@JeffBezanson JeffBezanson added performance Must go faster compiler:precompilation Precompilation of modules labels Feb 12, 2016
@vtjnash
Copy link
Member

vtjnash commented Feb 12, 2016

this probably due to some code that DataFrames is conditionally including when it detects precompile mode

@vtjnash vtjnash removed the compiler:precompilation Precompilation of modules label Feb 12, 2016
@JeffBezanson
Copy link
Member Author

Really?

@stevengj
Copy link
Member

If you have a dependency chain of N stale modules (module n depends on module n–1 for n=2…N), loading them without compilation is O(N), but precompiling them currently seems to be O(N^2): my understanding is that N precompilation processes are launched, and the n-th process needs to load (from cache) the previous n-1 modules, so that O(N^2) modules are reconstituted from the cache overall.

Maybe we can do something like what Pkg.build does: require could launch only a single process to precompile all the modules in a chain of stale dependencies. But this would involve some changes in how --output-ji worked?

@vtjnash vtjnash added the regression Regression in behavior compared to a previous version label Apr 4, 2016
@vtjnash
Copy link
Member

vtjnash commented Apr 4, 2016

I think part of the problem here is warmup cost. Precompilation assumes that will be very low (because it shouldn't require any actual work), but master is much slower (3x) than v0.4 was. Since startup costs are applied n^2 times, this could quickly become significant:

  | | |_| | | | (_| |  |  Version 0.4.3 (2016-01-12 21:37 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-apple-darwin13.4.0

julia> @time @eval using Compat
  0.346696 seconds (167.22 k allocations: 8.441 MB, 7.86% gc time)

julia> @time @eval using FixedPointNumbers
  0.031285 seconds (50.32 k allocations: 1.736 MB)
  | | |_| | | | (_| |  |  Version 0.5.0-dev+3366 (2016-04-01 18:54 UTC)
 _/ |\__'_|_|_|\__'_|  |  master/eb05d6b* (fork: 1 commits, 3 days)
|__/                   |  x86_64-apple-darwin15.3.0

julia> @time @eval using Compat
  1.040900 seconds (229.41 k allocations: 10.556 MB)

julia> @time @eval using FixedPointNumbers
  0.098878 seconds (56.40 k allocations: 2.322 MB)

@JeffBezanson JeffBezanson added the compiler:precompilation Precompilation of modules label Apr 14, 2016
@JeffBezanson
Copy link
Member Author

I do see a bit of improvement; precompiling DataFrames now takes 38 sec instead of 47, but this is still far slower than the 11 seconds taken to load it without precompiling.

@JeffBezanson JeffBezanson reopened this Apr 19, 2016
@vtjnash
Copy link
Member

vtjnash commented Apr 19, 2016

hm, ok, no worries, i'm still looking at ways of driving down some of the remaining cost overhead. i had seen a bigger drop on my machine, but i think that might be because i'm using llvm3.3 and the compile-time penalty for the remaining items is still significant against llvm3.7? can you test @time using Compat (after precompilation). i think it should be ~0.2 sec now

@JeffBezanson
Copy link
Member Author

On my machine @time using Compat gives 0.54 sec.

@JeffBezanson
Copy link
Member Author

After #15934, I get 35 seconds and 0.25 seconds.

@vtjnash
Copy link
Member

vtjnash commented Apr 19, 2016

this cuts of a couple more seconds:

diff --git a/src/gf.c b/src/gf.c
index 7437d66..ea8af0d 100644
--- a/src/gf.c
+++ b/src/gf.c
@@ -1782,7 +1782,7 @@ void jl_method_table_insert(jl_methtable_t *mt, jl_tupletype_t *type, jl_tuplety
         method_overwrite(newentry, (jl_method_t*)oldvalue);
     else
         check_ambiguous_matches(mt->defs, newentry);
-    invalidate_conflicting(&mt->cache, (jl_value_t*)type, (jl_value_t*)mt);
+    //invalidate_conflicting(&mt->cache, (jl_value_t*)type, (jl_value_t*)mt);
     update_max_args(mt, type);
     if (jl_newmeth_tracer)
         jl_newmeth_tracer(method);

@vtjnash
Copy link
Member

vtjnash commented Apr 29, 2016

#16125 gave me another idea on how to cut down this time. we don't really need to issue multiple copies of ambiguous match warnings. so we can simply skip that step when deserializing a method during incremental compilation:

diff --git a/src/gf.c b/src/gf.c
index ac6c7ea..866d065 100644
--- a/src/gf.c
+++ b/src/gf.c
@@ -868,7 +868,7 @@ void jl_method_table_insert(jl_methtable_t *mt, jl_tupletype_t *type, jl_tuplety
             type, tvars, simpletype, jl_emptysvec, (jl_value_t*)method, 0, &method_defs, &oldvalue);
     if (oldvalue)
         method_overwrite(newentry, (jl_method_t*)oldvalue);
-    else
+    else if (!jl_options.incremental || !jl_generating_output())
         check_ambiguous_matches(mt->defs, newentry);
     invalidate_conflicting(&mt->cache, (jl_value_t*)type, (jl_value_t*)mt);
     update_max_args(mt, type);

on my computer, this takes the time from 50s -> 40s (vs. 18s w/o cache)
however, this is a bit of an over-approximation, since we would still need to preserve the ambiguity testing for method defined in the module.

@JeffBezanson
Copy link
Member Author

Nice. I guess we could add a check_ambiguities argument to jl_method_table_insert? Or as part of #16125, we could have a flag that says whether a method has already been checked.

@joehuchette joehuchette added performance Must go faster compiler:precompilation Precompilation of modules and removed compiler:precompilation Precompilation of modules performance Must go faster labels Apr 29, 2016
@vtjnash
Copy link
Member

vtjnash commented May 11, 2016

I think #16255 and #16320 should make a small dent in this (also #16081?)

@timholy
Copy link
Member

timholy commented Jun 27, 2016

I'm following up on an interrupted JuliaCon conversation, in which some concern was expressed whether precompilation was helping or hurting. My own personal perspective is that:

  • this "slow precompilation" issue is indeed important, and would be lovely to fix. But, it mostly dramatically affects people who rebuild julia all the time; I don't personally think that it should be taken as having very much relevance to the question of whether package precompilation is a good or bad feature for most users.
  • people who rebuild julia all the time (or update packages multiple times a day) have the option to use --compilecache=no, so there's already a workaround
  • for people who don't rebuild constantly, I think the verdict is clear: precompilation saves considerable time and remains one of the best features added in julia-0.4
  • there remain frequent complaints about slow package loading time; I'm guessing these are largely complaints about "delay to first execution," not technically the loading time. Gallium and Gadfly are great poster children for this---they are both reasonable to load (~1s), it's the first execution that is painful (>10s).
  • the ideal fix would of course be to dramatically improve the speed of compilation. In the meantime, I'd argue that serious thoughts should be given to becoming even more aggressive about caching code, even at the cost of making this particular issue considerably worse. Besides, it's not clear that additional caching will make this issue worse: IIUC, most of the overhead comes from recursively starting up independent julia processes. Once those processes are started, we might as well ask them to do more work for us. Caveat here: if init contains logic that makes decisions about how much work to do based on ccall(:jl_generating_output, Cint, ()), then of course precompiling is going to be slower, potentially by a lot. That's the breaks, and I personally am prepared to live with it since (1) we can control how many precompile statements we issue, and (2) we have --compilecache=no.

If no serious experiments in this direction have been done, let me know how I can help. I recognize that now may not be the time to add this to julia-0.5, but it should be considered for later development.

@davidanthoff
Copy link
Contributor

For me the issue is the unpredictability when precompile will hit. It always hits me when I need a working julia instance now, not in 5 minutes (group of people in my office and I want to show them something).

#16409 would eliminate this problem entirely for me.

@vtjnash
Copy link
Member

vtjnash commented Aug 24, 2016

What is the acceptable overhead for which this can be actionably closed?

@Keno
Copy link
Member

Keno commented Aug 24, 2016

0

@JeffBezanson
Copy link
Member Author

I would say maybe 10%.

tkelman pushed a commit that referenced this issue Aug 29, 2016
mfasi pushed a commit to mfasi/julia that referenced this issue Sep 5, 2016
@vtjnash
Copy link
Member

vtjnash commented Sep 7, 2016

almost getting there :)

llvm 3.3
user    0m15.395s w/ precompile
user    0m10.536s w/o

llvm 3.7.1
user    0m18.280s w/ precompile
user    0m11.721s w/o

@yuyichao
Copy link
Contributor

yuyichao commented Sep 7, 2016

is this the first time I see llvm 3.7.1 being faster at compiling than llvm 3.3?

@vtjnash
Copy link
Member

vtjnash commented Sep 7, 2016

nope, just pasted the numbers backwards (fixed now)

@kindlychung
Copy link
Contributor

Slow precompile is a real killer of user experience. R and python users are used to loading most packages instantly.

@xiuliren
Copy link

It takes too long to try some new code and debug.
I also need to reboot kernel and rerun all the scripts if I made some change in package.

@KristofferC
Copy link
Member

No you don't, use Revise.jl.

@KristofferC
Copy link
Member

KristofferC commented Aug 15, 2018

Updated datapoint:

# Precompiling
julia> @time using DataFrames
[ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]
 28.836874 seconds (3.78 M allocations: 231.428 MiB, 0.45% gc time)

# Loading with precompile
julia> @time using DataFrames
  1.257824 seconds (2.62 M allocations: 174.873 MiB, 2.72% gc time)

# Loading with precompile disabled
julia> @time using DataFrames
 14.189208 seconds (20.71 M allocations: 1.032 GiB, 3.25% gc time)

@KristofferC KristofferC removed the performance Must go faster label Oct 24, 2018
@epogrebnyak
Copy link

But, it mostly dramatically affects people who rebuild julia all the time;

Actually long precompile time affects you when you are using unit tests and have some dependencies.

@ViralBShah
Copy link
Member

The original issue seems adequately addressed and this is good to close imo.

@ViralBShah ViralBShah reopened this Dec 8, 2018
@ViralBShah
Copy link
Member

Leaving it open for someone else to decide whether to close and open now specific issues, or to keep this open.

@stevengj
Copy link
Member

+1 for closing this in favor of more specific issues.

@JeffBezanson
Copy link
Member Author

I disagree that this is fixed. Trying it on master, precompiling DataFrames takes 77 seconds (worse than ever) and loading it with --compiled-modules=no takes 35 seconds.

@JeffBezanson
Copy link
Member Author

I'm looking into the using DataFrames slowdown, and the cause seems to be similar to what happened in #27874. There are lots of method invalidations related to the error path in copyto!. I think there are some things we can do to improve our backedges. But in general, there is a problem with excessive recursion and method invalidation due to the path

low-level function -> build error string -> string formatting and io code -> low-level function

As one example, the following patch cuts the time to load DataFrames (--compiled-modules=no) from 35 to 21 seconds:

diff --git a/base/array.jl b/base/array.jl
index f13c6ee..24f8745 100644
--- a/base/array.jl
+++ b/base/array.jl
@@ -279,7 +279,7 @@ end
 # occurs, see discussion in #27874
 function _throw_argerror(n)
     @_noinline_meta
-    throw(ArgumentError(string("tried to copy n=", n, " elements, but n should be nonnegative")))
+    throw(ArgumentError("oops"))
 end
 
 copyto!(dest::Array{T}, src::Array{T}) where {T} = copyto!(dest, 1, src, 1, length(src))
diff --git a/base/strings/io.jl b/base/strings/io.jl
index 71767ce..198635c 100644
--- a/base/strings/io.jl
+++ b/base/strings/io.jl
@@ -104,7 +104,7 @@ function sprint(f::Function, args...; context=nothing, sizehint::Integer=0)
 end
 
 tostr_sizehint(x) = 8
-tostr_sizehint(x::AbstractString) = lastindex(x)
+tostr_sizehint(x::String) = sizeof(x)
 tostr_sizehint(x::Float64) = 20
 tostr_sizehint(x::Float32) = 12

@JeffBezanson
Copy link
Member Author

JeffBezanson commented Dec 19, 2018

JuliaDB is an interesting case:

version --compiled-modules=no precompile time
0.6 30 sec 97 sec
1.2 37 sec 130 sec

Another interesting data point: v1.2 passing --compile=min to precompile processes instead of -O0, precompiling JuliaDB takes 90 sec.

@timholy
Copy link
Member

timholy commented Dec 19, 2018

Have you by any chance experimented with what happens if you turn @inline into a no-op?

@timholy
Copy link
Member

timholy commented Aug 22, 2021

I wonder if we can close this. First, dev DataFrames and comment out these lines since they obviously make it an apples-vs-oranges comparison. Because DataFrames' dependencies may also have precompilation, I don't think it's safe to compare the compile-the-whole-thing pipeline. So let's just focus on DataFrames itself:

tim@diva:~/src/julia-master/base$ JULIA_DEPOT_PATH=/tmp/pkg2 juliam -q
shell> rm -rf /tmp/pkg2/compiled/v1.8/DataFrames

julia> @time using DataFrames
[ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]
  2.729798 seconds (1.41 M allocations: 85.187 MiB, 0.38% gc time, 1.37% compilation time)

julia> 
tim@diva:~/src/julia-master/base$ JULIA_DEPOT_PATH=/tmp/pkg2 juliam -q --compiled-modules=no
shell> rm -rf /tmp/pkg2/compiled/v1.8/DataFrames

julia> @time using DataFrames
  5.334171 seconds (3.79 M allocations: 213.627 MiB, 0.76% gc time, 5.01% compilation time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency compiler:precompilation Precompilation of modules
Projects
None yet
Development

No branches or pull requests