Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Cargo half of pipelined compilation #6864

Closed
wants to merge 2 commits into from

Conversation

alexcrichton
Copy link
Member

This implements what I hope is 90% ish of the Cargo side of the work of #6660. The goal here is to eventually execute rlib compilations in a more "pipelined fashion" which involves taking more advantage of the parallelism that most machines have nowadays.

There is a lot of comments and wording on each commit and in documentation, so I won't repeat it all here too much!

We probably don't want to land this too eagerly, but it should in theory pass all tests and be ready to land at any time (just not bring any benefit yet). I'd be fine holding off until the rustc side of pipelining is implemented and then we can polish it off here and get this ready to go!

@rust-highfive
Copy link

r? @ehuss

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 19, 2019
@alexcrichton
Copy link
Member Author

cc @nnethercote, you're likely very interested in this!

r? @ehuss you're probably the best to take a look at it

@ehuss I'm not 100% satisfied with how management of pipelined compilation ended up, but overall I think it turned out better than expected. If you've got ideas of how to better model this in Cargo please let me know!

Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I like all the cleanup.

I might be a little uncomfortable landing this immediately without the rustc side being ready. I think the cleanup changes could be landed immediately, though. Would you want to post them in separate PRs? I don't have a sense of how long the rustc side will take.

I would also be a little uncomfortable landing this enabled by default. It seems like it might have a high risk of breaking some users.

One thing that surprised me is that the total disk space needed in target went down with this change. For example, cargo's target goes from 956MB to 950MB. Even though there are now extra .rmeta files, each .rlib file is overall smaller. Can you explain why that is?

There does seem to be a small performance penalty. Probably not enough to worry about, but something to be aware of. A build of cargo where everything is fresh is about 10% slower for me (240ms to 260ms). A from-scratch build is about 5% slower (103s to 109s).

From a high level, I was wondering if you considered using multiple edges in the graph instead of multiple nodes? Something like this where the orange edges are "rmeta" edges:

Untitled Diagram

That way, it might be a little simpler, and most of the logic could be focused in JobQueue. When the "rmeta ready" signal is received, JobQueue could just delete the corresponding orange edges. I haven't thought about this beyond just a basic concept, so maybe there are some complexities that would make this less desirable.

src/cargo/core/manifest.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/context/unit_dependencies.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/job_queue.rs Outdated Show resolved Hide resolved
tests/testsuite/dep_info.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/job_queue.rs Outdated Show resolved Hide resolved
tests/testsuite/build.rs Outdated Show resolved Hide resolved
tests/testsuite/build.rs Outdated Show resolved Hide resolved
tests/testsuite/build.rs Outdated Show resolved Hide resolved
src/cargo/core/compiler/build_config.rs Show resolved Hide resolved
src/cargo/core/compiler/mod.rs Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Member Author

I might be a little uncomfortable landing this immediately without the rustc side being ready. I think the cleanup changes could be landed immediately, though. Would you want to post them in separate PRs? I don't have a sense of how long the rustc side will take.

I would also be a little uncomfortable landing this enabled by default. It seems like it might have a high risk of breaking some users.

I think I'd tend to agree as well. I don't mind splitting this up, but I think it may not be necessary in theory if we tun it off by default because I'm pretty confident at least that this is unlikely to regress anything out there in the wild if we default it to off.

Would you agree though? Don't mind splitting up, but wanted to double check :)

Even though there are now extra .rmeta files, each .rlib file is overall smaller. Can you explain why that is?

Uh...

witchcraft?

Are you sure that you were using before/after cargo against the exact same compiler? Were before/after cargo the tip of this PR and the merge-base with master?

I would also expect build directories to become much larger...

There does seem to be a small performance penalty

I'm gonna try to dig into this, I'll get back to you.

From a high level, I was wondering if you considered using multiple edges in the graph instead of multiple nodes? Something like this where the orange edges are "rmeta" edges:

I think that does conceptually make more sense yeah. I was talking with fitzgen over lunch and he recommended something similar where I think the translation is that instead of finishing a node in a dependency graph we finish an edge (or a set of edges). That would probably be easier to model than what is currently implemented and the job executor wouldn't need any change. I'll try to think more and see if it's reasonable to implement as part of this as well.

@nnethercote
Copy link
Contributor

@ehuss rust-lang/rust#60006 has my first attempt at the rustc side. It isn't right and more work is needed, but hopefully not too much more

@alexcrichton Is the code you've added assuming that the directive from rustc has a particular form?

@alexcrichton
Copy link
Member Author

@nnethercote nah this isn't tied to anything, and it's effectively "mocked out" in the sense that all we need to do now is update a few lines in Cargo with an implementation of reading a signal from rustc and Cargo should be good to go!

@ehuss
Copy link
Contributor

ehuss commented Apr 24, 2019

I'm fine with landing this if it is off by default.

Are you sure that you were using before/after cargo against the exact same compiler? Were before/after cargo the tip of this PR and the merge-base with master?

I was using CARGO_BUILD_PIPELINING=false to switch. Looking closer, the rlibs built with --emit=dep-info,metadata,link have only one CGU. Does that disable multiple cgu's (perhaps via is_compatible_with_codegen_units_and_single_output_file)?

@alexcrichton
Copy link
Member Author

Nothing like trading one form of parallelism for another! That definitely sounds like a rustc bug and would also explain the samller build directory (wow I had no idea one CGU had that much file savings). I'll investigate that on the rustc side of things soon.

Once #6867 lands I'll rebase this and clean up the history a bit and then we can land with it by default turned off?

bors added a commit that referenced this pull request Apr 25, 2019
Backend refactorings and perf improvements

This PR is extracted from #6864 and then also additionally adds some commits related to performance optimizations that I noticed while profiling #6864. Each commit is in theory standalone and should pass all the tests, as well as being descriptive about what it's doing.
@alexcrichton
Copy link
Member Author

Does that disable multiple cgu's (perhaps via is_compatible_with_codegen_units_and_single_output_file)?

You're spot on! That would also explain the slower overall compile as well as smaller output artifacts for sure

@bors
Copy link
Contributor

bors commented Apr 25, 2019

☔ The latest upstream changes (presumably #6867) made this pull request unmergeable. Please resolve the merge conflicts.

This commit starts to lay the groundwork for rust-lang#6660 where Cargo will
invoke rustc in a "pipelined" fashion. The goal here is to execute one
command to produce both an `*.rmeta` file as well as an `*.rlib` file
for candidate compilations. In that case if another rlib depends on that
compilation, then it can start as soon as the `*.rmeta` is ready and not
have to wait for the `*.rlib` compilation.

The major refactoring in this commit is to add a new form of
`CompileMode`: `BuildRmeta`. This mode is introduced to represent that a
dependency edge only depends on the metadata of a compilation rather
than the the entire linked artifact. After this is introduced the next
major change is to actually hook this up into the dependency graph.

The approach taken by this commit is to have a postprocessing pass over
the dependency graph. After we build a map of all dependencies between
units a "pipelining" pass runs and actually introduces the `BuildRmeta`
mode. This also makes it trivial to disable/enable pipelining which
we'll probably want to do for a preview period at least! The
`pipeline_compilations` function is intended to be extensively
documented with the graph that it creates as well as how it works in
terms of adding `BuildRmeta` nodes into the dependency graph.

This commit is not all that will be required for pieplining
compilations. It does, however, get the entire test suite passing with
this refactoring. The way this works is by ensuring that a pipelined
unit, one split from `Build` into both `Build` and `BuildRmeta`, to be a
unit that doesn't actually do any work. That way the `BuildRmeta`
actually does all the work currently and we should have a working Cargo
like we did before. Subsequent commits will work in updating the
`JobQueue` to account for pipelining...

Note that this commit itself doesn't really contain any tests because
there's no functional change to Cargo, only internal refactorings. This
does have a large impact on the test suite because the `--emit` flag has
now changed by default, so lots of test assertions needed updating.
This commit adds support to Cargo's internal `JobQueue` to execute
pipelined rlib compilations actually in a pipelined fashion. This
internally invovled some refactoring to ensure we juggled the right jobs
into the right places, ensuring that as soon as an rmeta file is produce
we can start subsequent compilations but still synchronize with the
finished result of a `Build` unit.

Internally this continues to still not actually pipeline anything in the
sense that rustc doesn't support pipelining, but it's in theory all the
groundwork necessary to read a signal from rustc that a metadata file is
ready to go and then plumb that into Cargo's job scheduler.
@ehuss
Copy link
Contributor

ehuss commented Apr 25, 2019

Once #6867 lands I'll rebase this and clean up the history a bit and then we can land with it by default turned off?

If you don't want to try to use multiple edges instead of multiple nodes (which I think could be a little simpler code-wise, but not required), then that seems reasonable.

@alexcrichton
Copy link
Member Author

Thinking about this I would like to test out that model and see how well it fits. It may be a bigger refactor but I suspect it's going to be more future proof. There's not a lot of time pressure right now since the rustc side of things still isn't ready yet, so no harm in exploring our options!

@alexcrichton
Copy link
Member Author

Ok I implemented the edge-based strategy and wow is it so much nicer. There's basically no comparison between them, so I'm going to close this.

I'll open a new PR once I've got everything cleaned up and documented. Thanks again for the suggestion @ehuss :)

@alexcrichton
Copy link
Member Author

An updated version (and much smaller) is posted at #6883

alexcrichton added a commit to alexcrichton/cargo that referenced this pull request Apr 30, 2019
This commit starts to lay the groundwork for rust-lang#6660 where Cargo will
invoke rustc in a "pipelined" fashion. The goal here is to execute one
command to produce both an `*.rmeta` file as well as an `*.rlib` file
for candidate compilations. In that case if another rlib depends on that
compilation, then it can start as soon as the `*.rmeta` is ready and not
have to wait for the `*.rlib` compilation.

Initially attempted in rust-lang#6864 with a pretty invasive refactoring this
iteration is much more lightweight and fits much more cleanly into
Cargo's backend. The approach taken here is to update the
`DependencyQueue` structure to carry a piece of data on each dependency
edge. This edge information represents the artifact that one node
requires from another, and then we a node has no outgoing edges it's
ready to build.

A dependency on a metadata file is modeled as just that, a dependency on
just the metadata and not the full build itself. Most of cargo's backend
doesn't really need to know about this edge information so it's
basically just calculated as we insert nodes into the `DependencyQueue`.
Once that's all in place it's just a few pieces here and there to
identify compilations that *can* be pipelined and then they're wired up
to depend on the rmeta file instead of the rlib file.
alexcrichton added a commit to alexcrichton/cargo that referenced this pull request May 8, 2019
This commit starts to lay the groundwork for rust-lang#6660 where Cargo will
invoke rustc in a "pipelined" fashion. The goal here is to execute one
command to produce both an `*.rmeta` file as well as an `*.rlib` file
for candidate compilations. In that case if another rlib depends on that
compilation, then it can start as soon as the `*.rmeta` is ready and not
have to wait for the `*.rlib` compilation.

Initially attempted in rust-lang#6864 with a pretty invasive refactoring this
iteration is much more lightweight and fits much more cleanly into
Cargo's backend. The approach taken here is to update the
`DependencyQueue` structure to carry a piece of data on each dependency
edge. This edge information represents the artifact that one node
requires from another, and then we a node has no outgoing edges it's
ready to build.

A dependency on a metadata file is modeled as just that, a dependency on
just the metadata and not the full build itself. Most of cargo's backend
doesn't really need to know about this edge information so it's
basically just calculated as we insert nodes into the `DependencyQueue`.
Once that's all in place it's just a few pieces here and there to
identify compilations that *can* be pipelined and then they're wired up
to depend on the rmeta file instead of the rlib file.
bors added a commit that referenced this pull request May 10, 2019
Implement the Cargo half of pipelined compilation (take 2)

This commit starts to lay the groundwork for #6660 where Cargo will
invoke rustc in a "pipelined" fashion. The goal here is to execute one
command to produce both an `*.rmeta` file as well as an `*.rlib` file
for candidate compilations. In that case if another rlib depends on that
compilation, then it can start as soon as the `*.rmeta` is ready and not
have to wait for the `*.rlib` compilation.

Initially attempted in #6864 with a pretty invasive refactoring this
iteration is much more lightweight and fits much more cleanly into
Cargo's backend. The approach taken here is to update the
`DependencyQueue` structure to carry a piece of data on each dependency
edge. This edge information represents the artifact that one node
requires from another, and then we a node has no outgoing edges it's
ready to build.

A dependency on a metadata file is modeled as just that, a dependency on
just the metadata and not the full build itself. Most of cargo's backend
doesn't really need to know about this edge information so it's
basically just calculated as we insert nodes into the `DependencyQueue`.
Once that's all in place it's just a few pieces here and there to
identify compilations that *can* be pipelined and then they're wired up
to depend on the rmeta file instead of the rlib file.

Closes #6660
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants