Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize Pkg.precompile #2018

Merged
merged 15 commits into from
Sep 15, 2020

Conversation

IanButterworth
Copy link
Member

@IanButterworth IanButterworth commented Sep 13, 2020

Pkg on julia master:

(dev) pkg> st
Status `~/Documents/dev/Project.toml`
  [0c46a032] DifferentialEquations v6.15.0
  [91a5bcdd] Plots v1.6.3

julia> @time Pkg.precompile()
Precompiling project...
[ Info: Precompiling DifferentialEquations [0c46a032-eb83-5123-abaf-570d42b7fbaa]
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
312.061599 seconds (133.69 k allocations: 9.246 MiB)

This PR (same project, with .julia/compiled emptied out):

julia> @time Pkg.precompile()
Precompiling project...
[ Info: Precompiling Reexport [189a3867-3050-52da-a836-e630ba90ab69]
[ Info: Precompiling x264_jll [1270edf5-f2f9-52d2-97e9-ab00b5d0237a]
...
[ Info: Precompiling DifferentialEquations [0c46a032-eb83-5123-abaf-570d42b7fbaa]
 87.851553 seconds (4.25 M allocations: 216.406 MiB, 0.13% gc time)

The approach taken here async queues up all the precomp jobs for all deps in the manifest, and each watches for when its deps are all precomped before starting.

Thanks to @oxinabox for conceptualization of this approach.

Performance considerations

  • The process could launch an unlimited number of julia instances (via compilecache), so we might want to add a hard limit.
  • CPU load is initially high while there are many deps with few deps of their own, and falls off as the larger packages initiate. I don't see much room for further parallelization optimization

Screen Shot 2020-09-13 at 2 20 23 AM

  • Memory pressure on my system seemed unaffected in this test case

Screen Shot 2020-09-13 at 2 20 38 AM

@IanButterworth
Copy link
Member Author

One more datapoint. Adding Images nicely highlights the speed up when deps aren't closely interlinked.. it's only a few seconds slower than above

julia master

(dev) pkg> st
Status `~/Documents/dev/Project.toml`
  [0c46a032] DifferentialEquations v6.15.0
  [916415d5] Images v0.22.4
  [91a5bcdd] Plots v1.6.3

julia> @time Pkg.precompile()
Precompiling project...
[ Info: Precompiling DifferentialEquations [0c46a032-eb83-5123-abaf-570d42b7fbaa]
[ Info: Precompiling Images [916415d5-f1e6-5110-898d-aaa5f9f070e0]
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
323.380337 seconds (145.01 k allocations: 10.265 MiB)

this PR

92.614622 seconds (4.43 M allocations: 229.642 MiB, 0.23% gc time)

Copy link
Member

@staticfloat staticfloat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

src/API.jl Outdated Show resolved Hide resolved
src/API.jl Outdated Show resolved Hide resolved
src/API.jl Outdated
sleep(0.001)
end
end
Base.compilecache(pkg, sourcepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To limit parallelization, I suggest launching a million tasks like this, but then creating a Channel(num_tasks), then having each task put!() something into it just before this call to compilecache, then when it's finished, you take!() something back out. This will create, essentially, an N-parallel critical section, and allow N tasks to be running that section at once, while all others are blocked, waiting for the channel to free up space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this throws?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staticfloat aren't you describing a (counting) Semaphore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, pretty much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR now has a Semafore approach, and I tried out a channel-based approach here, which doesn't seem simpler master...ianshmean:ib/parallel_precomp_chanelbased

What should we move forward with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tkf just to bring the conversation to a single thread

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually implement precompile differently by using a recursive function that returns was_recompiled::Bool and wrapping the recursive calls with tasks. This way, we don't need to implement a future-like construct (i.e., was_processed + was_recompiled). The error handling would probably be more straightforward this way. Resource control is probably still easier with semaphore (unless we have an easy-to-use task pool and future in Base or stdlib) although I wish there were Base.acquire(f, semaphore).

But this is the kind of thing the trade-off is not completely clear until you have a concrete implementation. So, I think it's reasonable to defer this to future refactoring.

@IanButterworth
Copy link
Member Author

Thanks @staticfloat. Original example sped up from 87 seconds, to 80, with much fewer allocations 👍🏻

80.721224 seconds (854.44 k allocations: 76.073 MiB, 0.11% gc time)

src/Types.jl Outdated Show resolved Hide resolved
src/API.jl Outdated Show resolved Hide resolved
@IanButterworth
Copy link
Member Author

IanButterworth commented Sep 13, 2020

Ping for review @KristofferC @StefanKarpinski

Copy link
Member

@KristofferC KristofferC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. I don't really know enough about what compilecache does to know if this is fully sound.

Also, before complicating this function further it might be best to thoroughly investigate #1578 first.

src/API.jl Outdated Show resolved Hide resolved
printpkgstyle(ctx, :Precompiling, "project...")

num_tasks = parse(Int, get(ENV, "JULIA_NUM_PRECOMPILE_TASKS", string(Sys.CPU_THREADS + 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems kinda excessive to introduce an env variable for this. Its so specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how else to gate this. Suggestions? There are some concerns that with a large core count this could accidentally OOM.

Copy link
Member

@KristofferC KristofferC Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much memory does each worker use approximately? Isn't this the case for every parallel workload that uses memory? Does this scale up to super high core counts, perhaps just setting an upper cap is OK.

I guess we should look at nthreads but everyone runs with that equal to 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the case for every parallel workload that uses memory?

Yes, but most other workloads allow tuning (e.g. via -t).

I guess we should look at nthreads but everyone runs with that equal to 1.

There's that, and also Lyndon's comment above that this is more like a multiprocessing thing than a multithreading thing. I also agree that I shouldn't have to limit my computation's thread count to limit the precompilation, and vice versa.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have it as a normal argument to the precompile function then. This is exactly what we already have to limit parallelism in the asynchronous package downloader.

Looking at it, funnily enough we do have an env variable for the package downloader but that seems like it was added as a workaround for something:

Pkg.jl/src/Types.jl

Lines 329 to 331 in ede7b07

# NOTE: The JULIA_PKG_CONCURRENCY environment variable is likely to be removed in
# the future. It currently stands as an unofficial workaround for issue #795.
num_concurrent_downloads::Int = haskey(ENV, "JULIA_PKG_CONCURRENCY") ? parse(Int, ENV["JULIA_PKG_CONCURRENCY"]) : 8

Copy link
Member

@KristofferC KristofferC Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although if we at some point want to run this automatically when a package is updated, there is no chance to give this argument.

Perhaps there should be a .julia/config/PkgConfig.toml where things like this could be set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, let's go with this for now. Can always tweak it later.

src/API.jl Outdated
sleep(0.001)
end
end
Base.compilecache(pkg, sourcepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this throws?

@IanButterworth
Copy link
Member Author

IanButterworth commented Sep 14, 2020

@KristofferC Regarding #1578, I couldn't quite figure out why Pkg.precompile would've stopped re-compiling stale dependents.. I think it might be that part of the stale_cachefile heuristic only works if the package is loaded in Main, which doesn't happen during the Pkg.precompile process, but perhaps that changed? That's a bit of a guess..

However I did find a fix that I just added here.

Basically, I add an additional check for if any dep was recompiled in this session, and if so, recompile the package

Behavior before the last commit:
I touch the ColorTypes source, then:

(dev) pkg> st
Status `~/Documents/dev/Project.toml`
  [3da002f7] ColorTypes v0.10.9 `~/.julia/dev/ColorTypes`
  [916415d5] Images v0.22.4

julia> Pkg.precompile()
Precompiling project...
[ Info: Precompiling ColorTypes [3da002f7-5984-5a60-b8a6-cbb66c0b333f]

julia> using Images
[ Info: Precompiling Images [916415d5-f1e6-5110-898d-aaa5f9f070e0]

julia> 

Now:

julia> ENV["JULIA_DEBUG"] = "Base"

julia> Pkg.precompile()
Precompiling project...
┌ Debug: Rejecting stale cache file /Users/ian/.julia/compiled/v1.6/ColorTypes/db21U_vWD4f.ji (mtime 1.60004722644721e9) because file /Users/ian/.julia/dev/ColorTypes/src/ColorTypes.jl (mtime 1.600047296964872e9) has changed
└ @ Base loading.jl:1431
[ Info: Precompiling ColorTypes [3da002f7-5984-5a60-b8a6-cbb66c0b333f]
[ Info: Precompiling Colors [5ae59095-9a9b-59fe-a467-6f913c188581]
[ Info: Precompiling Graphics [a2bd30eb-e257-5431-a919-1863eab51364]
[ Info: Precompiling ColorVectorSpace [c3611d14-8923-5661-9e6a-0046d554d3a4]
[ Info: Precompiling ImageCore [a09fc81d-aa75-5fe9-8630-4744c3626534]
[ Info: Precompiling ImageDistances [51556ac3-7006-55f5-8cb3-34580c88182d]
[ Info: Precompiling ImageAxes [2803e5a7-5153-5ecf-9a86-9b4c37f5f5ac]
[ Info: Precompiling ImageShow [4e3cecfd-b093-5904-9786-8bbb286a6a31]
[ Info: Precompiling ImageMorphology [787d08f9-d448-5407-9aad-5290dd7ab264]
[ Info: Precompiling ImageTransformations [02fcd773-0e25-5acc-982a-7f6622650795]
[ Info: Precompiling ImageMetadata [bc367c6b-8a6b-528e-b4bd-a4b897500b49]
[ Info: Precompiling ImageContrastAdjustment [f332f351-ec65-5f6a-b3d1-319c6670881a]
[ Info: Precompiling ImageFiltering [6a3955dd-da59-5b1f-98d4-e7296123deb5]
[ Info: Precompiling ImageQualityIndexes [2996bd0c-7a13-11e9-2da2-2f5ce47296a9]
[ Info: Precompiling Images [916415d5-f1e6-5110-898d-aaa5f9f070e0]

julia> using Images

julia> 

Note that the precompillation of the following packages doesn't generate the debug info here, because they were forced to compile by the new catch, not stale_cachefile.

cc. @timholy given #1578

Copy link
Member

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some nitpicks but it LGTM! Not that I'm an expert on any of this, though. Anyway, thanks a lot for doing this!

src/API.jl Outdated Show resolved Hide resolved
src/API.jl Outdated Show resolved Hide resolved
src/API.jl Outdated Show resolved Hide resolved
src/API.jl Outdated
Comment on lines 932 to 938
for path_to_try in paths::Vector{String}
staledeps = Base.stale_cachefile(sourcepath, path_to_try, Base.TOMLCache()) #|| any(deps_recompiled)
staledeps === true && continue
# TODO: else, this returns a list of packages that may be loaded to make this valid (the topological list)
stale = false
break
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it'd be better to have a function in Base that does this to reduce coupling to the implementation details in Base and Pkg.

src/API.jl Outdated
Comment on lines 939 to 947
if any_dep_recompiled || stale
Base.acquire(parallel_limiter)
Base.compilecache(pkg, sourcepath)
was_recompiled[pkg.uuid] = true
notify(precomp_events[pkg.uuid])
Base.release(parallel_limiter)
else
notify(precomp_events[pkg.uuid])
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the only relevant logic here is Base.compilecache(pkg, sourcepath) and everything else is for implementing `future-like system. My personal preference for doing something like this is to create a task pool and future separately from the core logic. Dealing semaphore and event at the application layer is way too low-level for me as I'd need to worry about deadlocks all the time. Anyway, it's just a comment. If Pkg devs are cool with it I guess there is no need to do something else.

src/API.jl Outdated Show resolved Hide resolved
@tkf
Copy link
Member

tkf commented Sep 14, 2020

why Pkg.precompile would've stopped re-compiling stale dependents

Indeed, old Pkg.precompile is not enough. Here is a repro that fools old Pkg.precompile: https://discourse.julialang.org/t/rfc-speeding-up-code-loading-when-using-multiple-processes-by-2x/37583/4

src/API.jl Outdated
for path_to_try in paths::Vector{String}
staledeps = Base.stale_cachefile(sourcepath, path_to_try, Base.TOMLCache()) #|| any(deps_recompiled)
staledeps === true && continue
# TODO: else, this returns a list of packages that may be loaded to make this valid (the topological list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this TODO comment is not accurate anymore since by construction the dependencies are all in the non-stale state at this point.

cc @KristofferC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jameson added this comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this thread. I git blamed them and that they were added by you @KristofferC so I removed both

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed that this comment is committed by @KristofferC in #626

https://github.com/JuliaLang/Pkg.jl/pull/626/files#diff-47db2d66769667be0982a7b56891b4a8R483

But maybe @KristofferC and @vtjnash discussed it in slack or sth? It'd be nice if @vtjnash can chime in so that we are not removing something important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall anything about this

src/API.jl Outdated

t = @async begin
length(pkg_dep_uuid_lists[i]) > 0 && wait.(map(x->precomp_events[x], pkg_dep_uuid_lists[i]))
any_dep_recompiled = any(map(x->was_recompiled[x], pkg_dep_uuid_lists[i]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you'd need any_dep_recompiled here since stale_cachefile treats packages that are already loaded as non-stale? If so, maybe adding some comments can be useful? (Though I think having a stale_cachefile-like function that ignores Base.loaded_modules would be much nicer.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't understand quite why stale_cachefile isn't identifying packages that have recompiled deps as stale.. Judging by #1578 I'm not sure anyone does yet. I just changed the code to skip the stale_cachefile checking entirely if any_dep_recompiled == true. It would likely always be faster than using stale_cachefile but there is an edge case that would still warrant fixing stale_cachefile though; if a package that others depend on was recompiled externally from this session

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that edge case isn't an issue. More detail here #1578 (comment)

src/API.jl Outdated
Base.acquire(parallel_limiter)
Base.compilecache(pkg, sourcepath)
was_recompiled[pkg.uuid] = true
notify(precomp_events[pkg.uuid])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation of parallelism feels very low-level. Semaphore, Event, notify, acquire, release etc. Why not just start n tasks that take work from a channel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out a channel based approach, but it still requires the notify system. It works but seems slightly more complicated master...ianshmean:ib/parallel_precomp_chanelbased

@IanButterworth
Copy link
Member Author

IanButterworth commented Sep 14, 2020

Error handling

This wasn't done properly, so I fixed it and now if I throw an error into ZygoteRules (middle of the pack dep), the precompile session will terminate early, but any packages currently finishing precomp won't be forcibly terminated.

julia> Pkg.precompile()
Precompiling project...
[ Info: Precompiling Reexport [189a3867-3050-52da-a836-e630ba90ab69]
...
[ Info: Precompiling QuadGK [1fd47b50-473d-5c70-9696-f719f8f3bcdc]
[ Info: Precompiling ZygoteRules [700de1a5-db45-46bc-99cf-38207098b444]
[ Info: Precompiling ForwardDiff [f6369f11-7733-5829-9624-2563aa707210]
ERROR: LoadError: dummy error
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] top-level scope
   @ ~/.julia/dev/ZygoteRules/src/ZygoteRules.jl:3
 [3] include
   @ ./Base.jl:377 [inlined]
 [4] include_package_for_output(input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId,UInt64}}, uuid_tuple::Tuple{UInt64,UInt64}, source::Nothing)
   @ Base ./loading.jl:1109
 [5] top-level scope
   @ none:1
 [6] eval
   @ ./boot.jl:344 [inlined]
 [7] eval(x::Expr)
   @ Base.MainInclude ./client.jl:446
 [8] top-level scope
   @ none:1
in expression starting at /Users/ian/.julia/dev/ZygoteRules/src/ZygoteRules.jl:1
[ Info: Precompiling Latexify [23fbe1c1-3f47-55db-b15f-69d7ec21a316]
[ Info: Precompiling Bzip2_jll [6e34b625-4abd-537c-b88f-471c36dfa7a0]
[ Info: Precompiling SortingAlgorithms [a2af1166-a08f-5f64-846c-94a0d3cef48c]
[ Info: Precompiling SimpleTraits [699a6c99-e7fa-54fc-8d76-47d257e15c1d]
[ Info: Precompiling LabelledArrays [2ee39098-c373-598a-b85f-a56591580800]
ERROR: TaskFailedException

    nested task error: Failed to precompile ZygoteRules [700de1a5-db45-46bc-99cf-38207098b444] to /Users/ian/.julia/compiled/v1.6/ZygoteRules/LmjSI_vWD4f.ji.
    Stacktrace:
     [1] macro expansion
       @ ~/Documents/GitHub/Pkg.jl/src/API.jl:951 [inlined]
     [2] (::Pkg.API.var"#199#205"{Dict{Base.UUID,Bool},Dict{Base.UUID,Base.Event},Vector{Vector{Base.UUID}},Base.Semaphore,Base.PkgId,Int64,String,Vector{String}})()
       @ Pkg.API ./task.jl:389
    
    caused by: Failed to precompile ZygoteRules [700de1a5-db45-46bc-99cf-38207098b444] to /Users/ian/.julia/compiled/v1.6/ZygoteRules/LmjSI_vWD4f.ji.
    Stacktrace:
     [1] error(s::String)
       @ Base ./error.jl:33
     [2] compilecache(pkg::Base.PkgId, path::String)
       @ Base ./loading.jl:1240
     [3] macro expansion
       @ ~/Documents/GitHub/Pkg.jl/src/API.jl:947 [inlined]
     [4] (::Pkg.API.var"#199#205"{Dict{Base.UUID,Bool},Dict{Base.UUID,Base.Event},Vector{Vector{Base.UUID}},Base.Semaphore,Base.PkgId,Int64,String,Vector{String}})()
       @ Pkg.API ./task.jl:389
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:347
 [2] macro expansion
   @ ./task.jl:366 [inlined]
 [3] precompile(ctx::Pkg.Types.Context)
   @ Pkg.API ~/Documents/GitHub/Pkg.jl/src/API.jl:920
 [4] precompile
   @ ~/Documents/GitHub/Pkg.jl/src/API.jl:895 [inlined]
 [5] top-level scope
   @ ./timing.jl:174 [inlined]
 [6] top-level scope
   @ ./REPL[3]:0

Copy link
Member

@KristofferC KristofferC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can merge when ready. But I'm gonna need help with the future maintenance of this code.

@IanButterworth
Copy link
Member Author

Just a small tweak to the error handling, but should be good to go now. Happy to help maintain once this starts getting used

@staticfloat staticfloat merged commit 0365938 into JuliaLang:master Sep 15, 2020
@staticfloat
Copy link
Member

Thanks Ian! This is great!

Comment on lines +954 to +955
errored = true
throw(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't bring JuliaLang/julia#37374 (comment) up again in this PR in time, but did somebody considered the case for OS-specific packages? That is to say, if we have

module DownstreamPkg
if Sys.iswindows()
    using SomeWindowsOnlyPackage
end
...
end

we don't need to compile SomeWindowsOnlyPackage (or propagate the error from compiling SomeWindowsOnlyPackage in non-Windows OS) when compiling DownstreamPkg in non-Windows OS.

I think we should report errors only from the importable packages (i.e., the ones in Project.toml). Of course, it'd be much nicer to integrate precompilation machinery in Base itself so that this kind of hack is not necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what makes it hard to parallelize in Base, because you don't know your actual dependencies up front?

Copy link
Member Author

@IanButterworth IanButterworth Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Perhaps the docstring for Pkg.precompile could be changed to

help?> Pkg.precompile
  Pkg.precompile()

  Parallelized precompilation of all the dependencies in the manifest of the project. 
  If you want to precompile only the dependencies that are actually used, instead skip 
  this and load the packages as per normal to trigger standard precompilation.
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let others decide if this is necessary, but it proved simple to implement an option to revert behavior #2021

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what makes it hard to parallelize in Base, because you don't know your actual dependencies up front?

Yeah. But it's a problem in Pkg, too (even if you can parse Manifest). I think the problem is that there is no database for "true" dependency tree since you can hide using in if branch.

Ultimately, only the precompilation subprocess "knows" the dependencies while it's precompiling a package. So, I think the "correct" way of doing this to use some kind of bidirectional RPC so that the precompilation subprocess can request the precompilation of its dependencies to the parent process. But that's very tedious. I think ignoring compilation error from non-direct dependencies is a decent approach.

(I feel like I should try implementing it now that I complained so much 🤣)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat idea on non-direct dependencies being allowed to fail.. I implemented it here #2021 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enought to wrap in

redirect_stderr(devnull)

here? That should propagate to the subprocess, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's implemented in #2021 but there was a question about redirect_stderr concurrency here #2021 (comment) so JuliaLang/julia#37596 was opened

@fredrikekre
Copy link
Member

Trying this out:

(@v1) pkg> precompile
Precompiling project...
[ Info: Precompiling MacroTools [1914dd2f-81c6-5fcd-8719-6d5c9610ff09]
[...]
[ Info: Precompiling PackageCompiler [9b87118b-4619-50d2-8e1e-99f35a4d4d9d]
ERROR: LoadError: LoadError: Cannot locate artifact 'x86_64-w64-mingw32' in '/home/fredrik/.julia/packages/PackageCompiler/vsMJE/Artifacts.toml'
Stacktrace:

Why did that happen? PackageCompiler that does something bad?

@IanButterworth
Copy link
Member Author

Does the same thing happen if you just using PackageCompiler?

@fredrikekre
Copy link
Member

fredrikekre commented Sep 15, 2020

Yea, strange. Perhaps this is not connected to this but to @staticfloat move to artifacts stdlib that broke?

@fredrikekre
Copy link
Member

Looks like stdout/stderr is not supressed though:

pkg> precompile
Precompiling project...
[...]
[ Info: Precompiling Pluto [c3e4b0f8-55cb-11ea-2926-15256bba5781]
┌ Info: 
│ 
│     Welcome to Pluto v0.11.14 🎈
│     Start a notebook server using:
│ 
│   julia> Pluto.run()
│ 
│     Have a look at the FAQ:
│     https://github.com/fonsp/Pluto.jl/wiki
└

@IanButterworth
Copy link
Member Author

The same thing happened in the old Pkg.precompile Pluto just puts a welcome message directly in the module rather than __init__(). I made the recommendation to remove it already fonsp/Pluto.jl#389 (comment)

@haampie
Copy link

haampie commented Sep 16, 2020

I've tried this on a machine with 128 physical cores (AMD EPYC 7742 2.25GHz), 128 GB memory, with everything in-memory (so, julia master version itself and JULIA_DEPOTH_PATH too):

] add LightGraphs, Optim, JuMP, PyCall, Images, Flux, Plots, Pluto, DifferentialEquations, DataFrames, Turing

Before:

julia> @time Pkg.precompile()
442.760569 seconds (209.17 k allocations: 16.029 MiB)

After:

julia> @time Pkg.precompile()
61.723726 seconds (2.18 M allocations: 185.585 MiB, 0.06% gc time)

that's >7x speedup. Unfortunately not a 128x speedup :p but I guess parallelism is limited.

parallel_limiter = Base.Semaphore(num_tasks)

man = Pkg.Types.read_manifest(ctx.env.manifest_file)
pkgids = [Base.PkgId(first(dep), last(dep).name) for dep in man if !Pkg.Operations.is_stdlib(first(dep))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this filtering was here before, but why is it necessary to filter out stdlibs?

(tmp.9L2rmMdEXO) pkg> st
Status `/tmp/tmp.9L2rmMdEXO/Project.toml`
  [37e2e46d] LinearAlgebra

(tmp.9L2rmMdEXO) pkg> precompile
Precompiling project...

julia> using LinearAlgebra
[ Info: Precompiling LinearAlgebra [37e2e46d-f89d-539d-b4ee-838fcccc9c8e]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't stdlib's always going to be precompiled already, and if you're dev-ing them they'd need to have their uuid removed, so wouldn't identify as stdlibs in that check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forgot to say that this is when you compile Julia without them in the sysimg. Perhaps we can instead filter based on if the package is already loaded? That should work for both regular packages and stdlibs. If it is a stdlib that is in the sysimg it doesn't need to precompile, and if it is a regular package that is already loaded in the session it is probably just precompiled from the using?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if they're loaded, and in need of recompiling? Perhaps the filter just isn't needed at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I am not sure what happens if you try to precompile stdlibs that are loaded though? Since no precompiles files exist, will that spend time on precompiling them anyway? At least we can add the filter I suggested to the stdlibs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2021 updated with this now (the PkgId version)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the stdlib check was for some kind of optimization (launching julia takes a few hundreds of ms even if you don't do anything). So, I think the correct predicate here is "is it in sysimage?" than "is it a stdlib?" since non-stdlib packages can be in sysimage and there is no point in calling compilecache for them. This also covers the exotic situation where some stdlibs are not in sysimage.

I think is_stdlib_and_loaded is better than nothing. But I feel it's a bit half-way solution if my assumption (the stdlib check was optimization) is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats true, I didn't think about regular packages in the sysimg. But perhaps #2018 (comment) is a good enough approximation of that? It seems pretty strange to (i) load a dependency, (ii) update its version, (iii) pkg> precompile, (iv) restart Julia and expect everything to be precompiled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although #2021 is looking good, I do like the properness of in_sysimage. It explains exactly why we'd always want to skip. I'll prepare a PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems pretty strange to (i) load a dependency, (ii) update its version, (iii) pkg> precompile, (iv) restart Julia and expect everything to be precompiled?

@fredrikekre Hmm... That was my expectation, actually. I generally expect pkg> $cmd and shell> jlpkg $cmd to be identical (when a project is not activated). Anyway, what do you think about #2021 + JuliaLang/julia#37652? I think in_sysimage is simple enough and nice to have.

@staticfloat
Copy link
Member

@haampie can you give us a feel for the memory requirements? Did the peak memory usage go up significantly?

@haampie
Copy link

haampie commented Sep 17, 2020

I just checked it with htop. Initially it has 9.2G used, when I run Pkg.precompile it jumps to 16.7G, but not for long. Initially many processes spin up, I'm not sure if htop is showing the peak memory or an average.

@giordano
Copy link
Contributor

How many threads do you see busy? The scheduler should use up to 257 parallel tasks on that machine, but I guess it won't use most of them

@haampie
Copy link

haampie commented Sep 17, 2020

Screenshot from 2020-09-17 02-14-57

6s later
Screenshot from 2020-09-17 02-15-03

35s later
Screenshot from 2020-09-17 02-15-32

@IanButterworth
Copy link
Member Author

Seems like a pretty extreme test, and reassuring results. Thanks @haampie!

aviatesk added a commit to aviatesk/Pkg.jl that referenced this pull request Sep 23, 2020
UUIDs are not in `keys` but `values` of `Project.deps`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants