Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize Pkg.precompile #2018

Merged
merged 15 commits into from
Sep 15, 2020
Merged
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 44 additions & 15 deletions src/API.jl
Original file line number Diff line number Diff line change
Expand Up @@ -895,31 +895,60 @@ end
precompile() = precompile(Context())
function precompile(ctx::Context)
printpkgstyle(ctx, :Precompiling, "project...")

pkgids = [Base.PkgId(uuid, name) for (name, uuid) in ctx.env.project.deps if !is_stdlib(uuid)]

num_tasks = parse(Int, get(ENV, "JULIA_NUM_PRECOMPILE_TASKS", string(Sys.CPU_THREADS + 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems kinda excessive to introduce an env variable for this. Its so specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how else to gate this. Suggestions? There are some concerns that with a large core count this could accidentally OOM.

Copy link
Member

@KristofferC KristofferC Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much memory does each worker use approximately? Isn't this the case for every parallel workload that uses memory? Does this scale up to super high core counts, perhaps just setting an upper cap is OK.

I guess we should look at nthreads but everyone runs with that equal to 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the case for every parallel workload that uses memory?

Yes, but most other workloads allow tuning (e.g. via -t).

I guess we should look at nthreads but everyone runs with that equal to 1.

There's that, and also Lyndon's comment above that this is more like a multiprocessing thing than a multithreading thing. I also agree that I shouldn't have to limit my computation's thread count to limit the precompilation, and vice versa.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have it as a normal argument to the precompile function then. This is exactly what we already have to limit parallelism in the asynchronous package downloader.

Looking at it, funnily enough we do have an env variable for the package downloader but that seems like it was added as a workaround for something:

Pkg.jl/src/Types.jl

Lines 329 to 331 in ede7b07

# NOTE: The JULIA_PKG_CONCURRENCY environment variable is likely to be removed in
# the future. It currently stands as an unofficial workaround for issue #795.
num_concurrent_downloads::Int = haskey(ENV, "JULIA_PKG_CONCURRENCY") ? parse(Int, ENV["JULIA_PKG_CONCURRENCY"]) : 8

Copy link
Member

@KristofferC KristofferC Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although if we at some point want to run this automatically when a package is updated, there is no chance to give this argument.

Perhaps there should be a .julia/config/PkgConfig.toml where things like this could be set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, let's go with this for now. Can always tweak it later.

parallel_limiter = Base.Semaphore(num_tasks)

man = Pkg.Types.read_manifest(ctx.env.manifest_file)
pkgids = [Base.PkgId(first(dep), last(dep).name) for dep in man if !Pkg.Operations.is_stdlib(first(dep))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this filtering was here before, but why is it necessary to filter out stdlibs?

(tmp.9L2rmMdEXO) pkg> st
Status `/tmp/tmp.9L2rmMdEXO/Project.toml`
  [37e2e46d] LinearAlgebra

(tmp.9L2rmMdEXO) pkg> precompile
Precompiling project...

julia> using LinearAlgebra
[ Info: Precompiling LinearAlgebra [37e2e46d-f89d-539d-b4ee-838fcccc9c8e]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't stdlib's always going to be precompiled already, and if you're dev-ing them they'd need to have their uuid removed, so wouldn't identify as stdlibs in that check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forgot to say that this is when you compile Julia without them in the sysimg. Perhaps we can instead filter based on if the package is already loaded? That should work for both regular packages and stdlibs. If it is a stdlib that is in the sysimg it doesn't need to precompile, and if it is a regular package that is already loaded in the session it is probably just precompiled from the using?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if they're loaded, and in need of recompiling? Perhaps the filter just isn't needed at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I am not sure what happens if you try to precompile stdlibs that are loaded though? Since no precompiles files exist, will that spend time on precompiling them anyway? At least we can add the filter I suggested to the stdlibs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2021 updated with this now (the PkgId version)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the stdlib check was for some kind of optimization (launching julia takes a few hundreds of ms even if you don't do anything). So, I think the correct predicate here is "is it in sysimage?" than "is it a stdlib?" since non-stdlib packages can be in sysimage and there is no point in calling compilecache for them. This also covers the exotic situation where some stdlibs are not in sysimage.

I think is_stdlib_and_loaded is better than nothing. But I feel it's a bit half-way solution if my assumption (the stdlib check was optimization) is correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats true, I didn't think about regular packages in the sysimg. But perhaps #2018 (comment) is a good enough approximation of that? It seems pretty strange to (i) load a dependency, (ii) update its version, (iii) pkg> precompile, (iv) restart Julia and expect everything to be precompiled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although #2021 is looking good, I do like the properness of in_sysimage. It explains exactly why we'd always want to skip. I'll prepare a PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems pretty strange to (i) load a dependency, (ii) update its version, (iii) pkg> precompile, (iv) restart Julia and expect everything to be precompiled?

@fredrikekre Hmm... That was my expectation, actually. I generally expect pkg> $cmd and shell> jlpkg $cmd to be identical (when a project is not activated). Anyway, what do you think about #2021 + JuliaLang/julia#37652? I think in_sysimage is simple enough and nice to have.

pkg_dep_uuid_lists = [collect(values(last(dep).deps)) for dep in man if !Pkg.Operations.is_stdlib(first(dep))]
filter!.(!is_stdlib, pkg_dep_uuid_lists)

if ctx.env.pkg !== nothing && isfile( joinpath( dirname(ctx.env.project_file), "src", ctx.env.pkg.name * ".jl") )
push!(pkgids, Base.PkgId(ctx.env.pkg.uuid, ctx.env.pkg.name))
push!(pkg_dep_uuid_lists, collect(keys(ctx.env.project.deps)))
end

precomp_events = Dict{Base.UUID,Base.Event}()
was_recompiled = Dict{Base.UUID,Bool}()
for pkgid in pkgids
precomp_events[pkgid.uuid] = Base.Event()
was_recompiled[pkgid.uuid] = false
end


IanButterworth marked this conversation as resolved.
Show resolved Hide resolved
precomp_tasks = Task[]
IanButterworth marked this conversation as resolved.
Show resolved Hide resolved
# TODO: since we are a complete list, but not topologically sorted, handling of recursion will be completely at random
for pkg in pkgids
for (i, pkg) in pairs(pkgids)
IanButterworth marked this conversation as resolved.
Show resolved Hide resolved
paths = Base.find_all_in_cache_path(pkg)
sourcepath = Base.locate_package(pkg)
sourcepath === nothing && continue
# Heuristic for when precompilation is disabled
occursin(r"\b__precompile__\(\s*false\s*\)", read(sourcepath, String)) && continue
stale = true
for path_to_try in paths::Vector{String}
staledeps = Base.stale_cachefile(sourcepath, path_to_try)
staledeps === true && continue
# TODO: else, this returns a list of packages that may be loaded to make this valid (the topological list)
stale = false
break
end
if stale
Base.compilecache(pkg, sourcepath)
end

t = @async begin
length(pkg_dep_uuid_lists[i]) > 0 && wait.(map(x->precomp_events[x], pkg_dep_uuid_lists[i]))
IanButterworth marked this conversation as resolved.
Show resolved Hide resolved
any_dep_recompiled = any(map(x->was_recompiled[x], pkg_dep_uuid_lists[i]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you'd need any_dep_recompiled here since stale_cachefile treats packages that are already loaded as non-stale? If so, maybe adding some comments can be useful? (Though I think having a stale_cachefile-like function that ignores Base.loaded_modules would be much nicer.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't understand quite why stale_cachefile isn't identifying packages that have recompiled deps as stale.. Judging by #1578 I'm not sure anyone does yet. I just changed the code to skip the stale_cachefile checking entirely if any_dep_recompiled == true. It would likely always be faster than using stale_cachefile but there is an edge case that would still warrant fixing stale_cachefile though; if a package that others depend on was recompiled externally from this session

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that edge case isn't an issue. More detail here #1578 (comment)

stale = true
for path_to_try in paths::Vector{String}
staledeps = Base.stale_cachefile(sourcepath, path_to_try, Base.TOMLCache()) #|| any(deps_recompiled)
staledeps === true && continue
# TODO: else, this returns a list of packages that may be loaded to make this valid (the topological list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this TODO comment is not accurate anymore since by construction the dependencies are all in the non-stale state at this point.

cc @KristofferC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jameson added this comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this thread. I git blamed them and that they were added by you @KristofferC so I removed both

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed that this comment is committed by @KristofferC in #626

https://github.com/JuliaLang/Pkg.jl/pull/626/files#diff-47db2d66769667be0982a7b56891b4a8R483

But maybe @KristofferC and @vtjnash discussed it in slack or sth? It'd be nice if @vtjnash can chime in so that we are not removing something important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall anything about this

stale = false
break
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it'd be better to have a function in Base that does this to reduce coupling to the implementation details in Base and Pkg.

if any_dep_recompiled || stale
Base.acquire(parallel_limiter)
Base.compilecache(pkg, sourcepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To limit parallelization, I suggest launching a million tasks like this, but then creating a Channel(num_tasks), then having each task put!() something into it just before this call to compilecache, then when it's finished, you take!() something back out. This will create, essentially, an N-parallel critical section, and allow N tasks to be running that section at once, while all others are blocked, waiting for the channel to free up space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this throws?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staticfloat aren't you describing a (counting) Semaphore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, pretty much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR now has a Semafore approach, and I tried out a channel-based approach here, which doesn't seem simpler master...ianshmean:ib/parallel_precomp_chanelbased

What should we move forward with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tkf just to bring the conversation to a single thread

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually implement precompile differently by using a recursive function that returns was_recompiled::Bool and wrapping the recursive calls with tasks. This way, we don't need to implement a future-like construct (i.e., was_processed + was_recompiled). The error handling would probably be more straightforward this way. Resource control is probably still easier with semaphore (unless we have an easy-to-use task pool and future in Base or stdlib) although I wish there were Base.acquire(f, semaphore).

But this is the kind of thing the trade-off is not completely clear until you have a concrete implementation. So, I think it's reasonable to defer this to future refactoring.

was_recompiled[pkg.uuid] = true
notify(precomp_events[pkg.uuid])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation of parallelism feels very low-level. Semaphore, Event, notify, acquire, release etc. Why not just start n tasks that take work from a channel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out a channel based approach, but it still requires the notify system. It works but seems slightly more complicated master...ianshmean:ib/parallel_precomp_chanelbased

Base.release(parallel_limiter)
else
notify(precomp_events[pkg.uuid])
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the only relevant logic here is Base.compilecache(pkg, sourcepath) and everything else is for implementing `future-like system. My personal preference for doing something like this is to create a task pool and future separately from the core logic. Dealing semaphore and event at the application layer is way too low-level for me as I'd need to worry about deadlocks all the time. Anyway, it's just a comment. If Pkg devs are cool with it I guess there is no need to do something else.

end
push!(precomp_tasks, t)
end
wait.(precomp_tasks)
IanButterworth marked this conversation as resolved.
Show resolved Hide resolved
nothing
end

Expand Down