-
-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize Pkg.precompile #2018
Parallelize Pkg.precompile #2018
Conversation
One more datapoint. Adding julia master
this PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!
src/API.jl
Outdated
sleep(0.001) | ||
end | ||
end | ||
Base.compilecache(pkg, sourcepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To limit parallelization, I suggest launching a million tasks like this, but then creating a Channel(num_tasks)
, then having each task put!()
something into it just before this call to compilecache
, then when it's finished, you take!()
something back out. This will create, essentially, an N
-parallel critical section, and allow N
tasks to be running that section at once, while all others are blocked, waiting for the channel to free up space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this throws?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staticfloat aren't you describing a (counting) Semaphore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, pretty much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR now has a Semafore approach, and I tried out a channel-based approach here, which doesn't seem simpler master...ianshmean:ib/parallel_precomp_chanelbased
What should we move forward with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @tkf just to bring the conversation to a single thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually implement precompile
differently by using a recursive function that returns was_recompiled::Bool
and wrapping the recursive calls with tasks. This way, we don't need to implement a future-like construct (i.e., was_processed
+ was_recompiled
). The error handling would probably be more straightforward this way. Resource control is probably still easier with semaphore (unless we have an easy-to-use task pool and future in Base or stdlib) although I wish there were Base.acquire(f, semaphore)
.
But this is the kind of thing the trade-off is not completely clear until you have a concrete implementation. So, I think it's reasonable to defer this to future refactoring.
Thanks @staticfloat. Original example sped up from 87 seconds, to 80, with much fewer allocations 👍🏻
|
Ping for review @KristofferC @StefanKarpinski |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool. I don't really know enough about what compilecache does to know if this is fully sound.
Also, before complicating this function further it might be best to thoroughly investigate #1578 first.
printpkgstyle(ctx, :Precompiling, "project...") | ||
|
||
num_tasks = parse(Int, get(ENV, "JULIA_NUM_PRECOMPILE_TASKS", string(Sys.CPU_THREADS + 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems kinda excessive to introduce an env variable for this. Its so specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how else to gate this. Suggestions? There are some concerns that with a large core count this could accidentally OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much memory does each worker use approximately? Isn't this the case for every parallel workload that uses memory? Does this scale up to super high core counts, perhaps just setting an upper cap is OK.
I guess we should look at nthreads but everyone runs with that equal to 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the case for every parallel workload that uses memory?
Yes, but most other workloads allow tuning (e.g. via -t
).
I guess we should look at nthreads but everyone runs with that equal to 1.
There's that, and also Lyndon's comment above that this is more like a multiprocessing thing than a multithreading thing. I also agree that I shouldn't have to limit my computation's thread count to limit the precompilation, and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have it as a normal argument to the precompile
function then. This is exactly what we already have to limit parallelism in the asynchronous package downloader.
Looking at it, funnily enough we do have an env variable for the package downloader but that seems like it was added as a workaround for something:
Lines 329 to 331 in ede7b07
# NOTE: The JULIA_PKG_CONCURRENCY environment variable is likely to be removed in | |
# the future. It currently stands as an unofficial workaround for issue #795. | |
num_concurrent_downloads::Int = haskey(ENV, "JULIA_PKG_CONCURRENCY") ? parse(Int, ENV["JULIA_PKG_CONCURRENCY"]) : 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although if we at some point want to run this automatically when a package is updated, there is no chance to give this argument.
Perhaps there should be a .julia/config/PkgConfig.toml
where things like this could be set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, let's go with this for now. Can always tweak it later.
src/API.jl
Outdated
sleep(0.001) | ||
end | ||
end | ||
Base.compilecache(pkg, sourcepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this throws?
- check stale_cachefile after deps finish
@KristofferC Regarding #1578, I couldn't quite figure out why However I did find a fix that I just added here. Basically, I add an additional check for if any dep was recompiled in this session, and if so, recompile the package Behavior before the last commit:
Now:
Note that the precompillation of the following packages doesn't generate the debug info here, because they were forced to compile by the new catch, not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some nitpicks but it LGTM! Not that I'm an expert on any of this, though. Anyway, thanks a lot for doing this!
src/API.jl
Outdated
for path_to_try in paths::Vector{String} | ||
staledeps = Base.stale_cachefile(sourcepath, path_to_try, Base.TOMLCache()) #|| any(deps_recompiled) | ||
staledeps === true && continue | ||
# TODO: else, this returns a list of packages that may be loaded to make this valid (the topological list) | ||
stale = false | ||
break | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it'd be better to have a function in Base
that does this to reduce coupling to the implementation details in Base
and Pkg
.
src/API.jl
Outdated
if any_dep_recompiled || stale | ||
Base.acquire(parallel_limiter) | ||
Base.compilecache(pkg, sourcepath) | ||
was_recompiled[pkg.uuid] = true | ||
notify(precomp_events[pkg.uuid]) | ||
Base.release(parallel_limiter) | ||
else | ||
notify(precomp_events[pkg.uuid]) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the only relevant logic here is Base.compilecache(pkg, sourcepath)
and everything else is for implementing `future-like system. My personal preference for doing something like this is to create a task pool and future separately from the core logic. Dealing semaphore and event at the application layer is way too low-level for me as I'd need to worry about deadlocks all the time. Anyway, it's just a comment. If Pkg devs are cool with it I guess there is no need to do something else.
Indeed, old |
src/API.jl
Outdated
for path_to_try in paths::Vector{String} | ||
staledeps = Base.stale_cachefile(sourcepath, path_to_try, Base.TOMLCache()) #|| any(deps_recompiled) | ||
staledeps === true && continue | ||
# TODO: else, this returns a list of packages that may be loaded to make this valid (the topological list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this TODO comment is not accurate anymore since by construction the dependencies are all in the non-stale state at this point.
cc @KristofferC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Jameson added this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this thread. I git blamed them and that they were added by you @KristofferC so I removed both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed that this comment is committed by @KristofferC in #626
But maybe @KristofferC and @vtjnash discussed it in slack or sth? It'd be nice if @vtjnash can chime in so that we are not removing something important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall anything about this
src/API.jl
Outdated
|
||
t = @async begin | ||
length(pkg_dep_uuid_lists[i]) > 0 && wait.(map(x->precomp_events[x], pkg_dep_uuid_lists[i])) | ||
any_dep_recompiled = any(map(x->was_recompiled[x], pkg_dep_uuid_lists[i])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing you'd need any_dep_recompiled
here since stale_cachefile
treats packages that are already loaded as non-stale? If so, maybe adding some comments can be useful? (Though I think having a stale_cachefile
-like function that ignores Base.loaded_modules
would be much nicer.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't understand quite why stale_cachefile
isn't identifying packages that have recompiled deps as stale.. Judging by #1578 I'm not sure anyone does yet. I just changed the code to skip the stale_cachefile
checking entirely if any_dep_recompiled == true
. It would likely always be faster than using stale_cachefile
but there is an edge case that would still warrant fixing stale_cachefile
though; if a package that others depend on was recompiled externally from this session
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that edge case isn't an issue. More detail here #1578 (comment)
src/API.jl
Outdated
Base.acquire(parallel_limiter) | ||
Base.compilecache(pkg, sourcepath) | ||
was_recompiled[pkg.uuid] = true | ||
notify(precomp_events[pkg.uuid]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation of parallelism feels very low-level. Semaphore
, Event
, notify
, acquire
, release
etc. Why not just start n tasks that take work from a channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried out a channel based approach, but it still requires the notify system. It works but seems slightly more complicated master...ianshmean:ib/parallel_precomp_chanelbased
Error handlingThis wasn't done properly, so I fixed it and now if I throw an error into ZygoteRules (middle of the pack dep), the precompile session will terminate early, but any packages currently finishing precomp won't be forcibly terminated.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can merge when ready. But I'm gonna need help with the future maintenance of this code.
Just a small tweak to the error handling, but should be good to go now. Happy to help maintain once this starts getting used |
Thanks Ian! This is great! |
errored = true | ||
throw(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't bring JuliaLang/julia#37374 (comment) up again in this PR in time, but did somebody considered the case for OS-specific packages? That is to say, if we have
module DownstreamPkg
if Sys.iswindows()
using SomeWindowsOnlyPackage
end
...
end
we don't need to compile SomeWindowsOnlyPackage
(or propagate the error from compiling SomeWindowsOnlyPackage
in non-Windows OS) when compiling DownstreamPkg
in non-Windows OS.
I think we should report errors only from the importable packages (i.e., the ones in Project.toml). Of course, it'd be much nicer to integrate precompilation machinery in Base itself so that this kind of hack is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that what makes it hard to parallelize in Base, because you don't know your actual dependencies up front?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Perhaps the docstring for Pkg.precompile
could be changed to
help?> Pkg.precompile
Pkg.precompile()
Parallelized precompilation of all the dependencies in the manifest of the project.
If you want to precompile only the dependencies that are actually used, instead skip
this and load the packages as per normal to trigger standard precompilation.
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let others decide if this is necessary, but it proved simple to implement an option to revert behavior #2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that what makes it hard to parallelize in Base, because you don't know your actual dependencies up front?
Yeah. But it's a problem in Pkg, too (even if you can parse Manifest). I think the problem is that there is no database for "true" dependency tree since you can hide using
in if
branch.
Ultimately, only the precompilation subprocess "knows" the dependencies while it's precompiling a package. So, I think the "correct" way of doing this to use some kind of bidirectional RPC so that the precompilation subprocess can request the precompilation of its dependencies to the parent process. But that's very tedious. I think ignoring compilation error from non-direct dependencies is a decent approach.
(I feel like I should try implementing it now that I complained so much 🤣)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat idea on non-direct dependencies being allowed to fail.. I implemented it here #2021 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be enought to wrap in
redirect_stderr(devnull)
here? That should propagate to the subprocess, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's implemented in #2021 but there was a question about redirect_stderr
concurrency here #2021 (comment) so JuliaLang/julia#37596 was opened
Trying this out:
Why did that happen? PackageCompiler that does something bad? |
Does the same thing happen if you just |
Yea, strange. Perhaps this is not connected to this but to @staticfloat move to artifacts stdlib that broke? |
Looks like stdout/stderr is not supressed though:
|
The same thing happened in the old |
I've tried this on a machine with 128 physical cores (AMD EPYC 7742 2.25GHz), 128 GB memory, with everything in-memory (so, julia master version itself and JULIA_DEPOTH_PATH too):
Before: julia> @time Pkg.precompile()
442.760569 seconds (209.17 k allocations: 16.029 MiB) After: julia> @time Pkg.precompile()
61.723726 seconds (2.18 M allocations: 185.585 MiB, 0.06% gc time) that's >7x speedup. Unfortunately not a 128x speedup :p but I guess parallelism is limited. |
parallel_limiter = Base.Semaphore(num_tasks) | ||
|
||
man = Pkg.Types.read_manifest(ctx.env.manifest_file) | ||
pkgids = [Base.PkgId(first(dep), last(dep).name) for dep in man if !Pkg.Operations.is_stdlib(first(dep))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this filtering was here before, but why is it necessary to filter out stdlibs?
(tmp.9L2rmMdEXO) pkg> st
Status `/tmp/tmp.9L2rmMdEXO/Project.toml`
[37e2e46d] LinearAlgebra
(tmp.9L2rmMdEXO) pkg> precompile
Precompiling project...
julia> using LinearAlgebra
[ Info: Precompiling LinearAlgebra [37e2e46d-f89d-539d-b4ee-838fcccc9c8e]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't stdlib's always going to be precompiled already, and if you're dev-ing them they'd need to have their uuid removed, so wouldn't identify as stdlibs in that check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, forgot to say that this is when you compile Julia without them in the sysimg. Perhaps we can instead filter based on if the package is already loaded? That should work for both regular packages and stdlibs. If it is a stdlib that is in the sysimg it doesn't need to precompile, and if it is a regular package that is already loaded in the session it is probably just precompiled from the using
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what if they're loaded, and in need of recompiling? Perhaps the filter just isn't needed at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I am not sure what happens if you try to precompile stdlibs that are loaded though? Since no precompiles files exist, will that spend time on precompiling them anyway? At least we can add the filter I suggested to the stdlibs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2021 updated with this now (the PkgId version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the stdlib check was for some kind of optimization (launching julia
takes a few hundreds of ms even if you don't do anything). So, I think the correct predicate here is "is it in sysimage?" than "is it a stdlib?" since non-stdlib packages can be in sysimage and there is no point in calling compilecache
for them. This also covers the exotic situation where some stdlibs are not in sysimage.
I think is_stdlib_and_loaded
is better than nothing. But I feel it's a bit half-way solution if my assumption (the stdlib check was optimization) is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats true, I didn't think about regular packages in the sysimg. But perhaps #2018 (comment) is a good enough approximation of that? It seems pretty strange to (i) load a dependency, (ii) update its version, (iii) pkg> precompile, (iv) restart Julia and expect everything to be precompiled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although #2021 is looking good, I do like the properness of in_sysimage
. It explains exactly why we'd always want to skip. I'll prepare a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems pretty strange to (i) load a dependency, (ii) update its version, (iii) pkg> precompile, (iv) restart Julia and expect everything to be precompiled?
@fredrikekre Hmm... That was my expectation, actually. I generally expect pkg> $cmd
and shell> jlpkg $cmd
to be identical (when a project is not activated). Anyway, what do you think about #2021 + JuliaLang/julia#37652? I think in_sysimage
is simple enough and nice to have.
@haampie can you give us a feel for the memory requirements? Did the peak memory usage go up significantly? |
I just checked it with |
How many threads do you see busy? The scheduler should use up to 257 parallel tasks on that machine, but I guess it won't use most of them |
Seems like a pretty extreme test, and reassuring results. Thanks @haampie! |
UUIDs are not in `keys` but `values` of `Project.deps`
Pkg on julia master:
This PR (same project, with
.julia/compiled
emptied out):The approach taken here async queues up all the precomp jobs for all deps in the manifest, and each watches for when its deps are all precomped before starting.
Thanks to @oxinabox for conceptualization of this approach.
Performance considerations
The process could launch an unlimited number of julia instances (viacompilecache
), so we might want to add a hard limit.