-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the Cargo half of pipelined compilation #6864
Conversation
r? @ehuss (rust_highfive has picked a reviewer for you, use r? to override) |
cc @nnethercote, you're likely very interested in this! r? @ehuss you're probably the best to take a look at it @ehuss I'm not 100% satisfied with how management of pipelined compilation ended up, but overall I think it turned out better than expected. If you've got ideas of how to better model this in Cargo please let me know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I like all the cleanup.
I might be a little uncomfortable landing this immediately without the rustc side being ready. I think the cleanup changes could be landed immediately, though. Would you want to post them in separate PRs? I don't have a sense of how long the rustc side will take.
I would also be a little uncomfortable landing this enabled by default. It seems like it might have a high risk of breaking some users.
One thing that surprised me is that the total disk space needed in target
went down with this change. For example, cargo's target goes from 956MB to 950MB. Even though there are now extra .rmeta
files, each .rlib
file is overall smaller. Can you explain why that is?
There does seem to be a small performance penalty. Probably not enough to worry about, but something to be aware of. A build of cargo where everything is fresh is about 10% slower for me (240ms to 260ms). A from-scratch build is about 5% slower (103s to 109s).
From a high level, I was wondering if you considered using multiple edges in the graph instead of multiple nodes? Something like this where the orange edges are "rmeta" edges:
That way, it might be a little simpler, and most of the logic could be focused in JobQueue
. When the "rmeta ready" signal is received, JobQueue
could just delete the corresponding orange edges. I haven't thought about this beyond just a basic concept, so maybe there are some complexities that would make this less desirable.
52afc59
to
7728a13
Compare
I think I'd tend to agree as well. I don't mind splitting this up, but I think it may not be necessary in theory if we tun it off by default because I'm pretty confident at least that this is unlikely to regress anything out there in the wild if we default it to off. Would you agree though? Don't mind splitting up, but wanted to double check :)
Uh... witchcraft? Are you sure that you were using before/after cargo against the exact same compiler? Were before/after cargo the tip of this PR and the merge-base with master? I would also expect build directories to become much larger...
I'm gonna try to dig into this, I'll get back to you.
I think that does conceptually make more sense yeah. I was talking with fitzgen over lunch and he recommended something similar where I think the translation is that instead of finishing a node in a dependency graph we finish an edge (or a set of edges). That would probably be easier to model than what is currently implemented and the job executor wouldn't need any change. I'll try to think more and see if it's reasonable to implement as part of this as well. |
@ehuss rust-lang/rust#60006 has my first attempt at the rustc side. It isn't right and more work is needed, but hopefully not too much more @alexcrichton Is the code you've added assuming that the directive from rustc has a particular form? |
@nnethercote nah this isn't tied to anything, and it's effectively "mocked out" in the sense that all we need to do now is update a few lines in Cargo with an implementation of reading a signal from rustc and Cargo should be good to go! |
I'm fine with landing this if it is off by default.
I was using |
Nothing like trading one form of parallelism for another! That definitely sounds like a rustc bug and would also explain the samller build directory (wow I had no idea one CGU had that much file savings). I'll investigate that on the rustc side of things soon. Once #6867 lands I'll rebase this and clean up the history a bit and then we can land with it by default turned off? |
Backend refactorings and perf improvements This PR is extracted from #6864 and then also additionally adds some commits related to performance optimizations that I noticed while profiling #6864. Each commit is in theory standalone and should pass all the tests, as well as being descriptive about what it's doing.
You're spot on! That would also explain the slower overall compile as well as smaller output artifacts for sure |
☔ The latest upstream changes (presumably #6867) made this pull request unmergeable. Please resolve the merge conflicts. |
This commit starts to lay the groundwork for rust-lang#6660 where Cargo will invoke rustc in a "pipelined" fashion. The goal here is to execute one command to produce both an `*.rmeta` file as well as an `*.rlib` file for candidate compilations. In that case if another rlib depends on that compilation, then it can start as soon as the `*.rmeta` is ready and not have to wait for the `*.rlib` compilation. The major refactoring in this commit is to add a new form of `CompileMode`: `BuildRmeta`. This mode is introduced to represent that a dependency edge only depends on the metadata of a compilation rather than the the entire linked artifact. After this is introduced the next major change is to actually hook this up into the dependency graph. The approach taken by this commit is to have a postprocessing pass over the dependency graph. After we build a map of all dependencies between units a "pipelining" pass runs and actually introduces the `BuildRmeta` mode. This also makes it trivial to disable/enable pipelining which we'll probably want to do for a preview period at least! The `pipeline_compilations` function is intended to be extensively documented with the graph that it creates as well as how it works in terms of adding `BuildRmeta` nodes into the dependency graph. This commit is not all that will be required for pieplining compilations. It does, however, get the entire test suite passing with this refactoring. The way this works is by ensuring that a pipelined unit, one split from `Build` into both `Build` and `BuildRmeta`, to be a unit that doesn't actually do any work. That way the `BuildRmeta` actually does all the work currently and we should have a working Cargo like we did before. Subsequent commits will work in updating the `JobQueue` to account for pipelining... Note that this commit itself doesn't really contain any tests because there's no functional change to Cargo, only internal refactorings. This does have a large impact on the test suite because the `--emit` flag has now changed by default, so lots of test assertions needed updating.
This commit adds support to Cargo's internal `JobQueue` to execute pipelined rlib compilations actually in a pipelined fashion. This internally invovled some refactoring to ensure we juggled the right jobs into the right places, ensuring that as soon as an rmeta file is produce we can start subsequent compilations but still synchronize with the finished result of a `Build` unit. Internally this continues to still not actually pipeline anything in the sense that rustc doesn't support pipelining, but it's in theory all the groundwork necessary to read a signal from rustc that a metadata file is ready to go and then plumb that into Cargo's job scheduler.
If you don't want to try to use multiple edges instead of multiple nodes (which I think could be a little simpler code-wise, but not required), then that seems reasonable. |
7728a13
to
2114c5d
Compare
Thinking about this I would like to test out that model and see how well it fits. It may be a bigger refactor but I suspect it's going to be more future proof. There's not a lot of time pressure right now since the rustc side of things still isn't ready yet, so no harm in exploring our options! |
Ok I implemented the edge-based strategy and wow is it so much nicer. There's basically no comparison between them, so I'm going to close this. I'll open a new PR once I've got everything cleaned up and documented. Thanks again for the suggestion @ehuss :) |
An updated version (and much smaller) is posted at #6883 |
This commit starts to lay the groundwork for rust-lang#6660 where Cargo will invoke rustc in a "pipelined" fashion. The goal here is to execute one command to produce both an `*.rmeta` file as well as an `*.rlib` file for candidate compilations. In that case if another rlib depends on that compilation, then it can start as soon as the `*.rmeta` is ready and not have to wait for the `*.rlib` compilation. Initially attempted in rust-lang#6864 with a pretty invasive refactoring this iteration is much more lightweight and fits much more cleanly into Cargo's backend. The approach taken here is to update the `DependencyQueue` structure to carry a piece of data on each dependency edge. This edge information represents the artifact that one node requires from another, and then we a node has no outgoing edges it's ready to build. A dependency on a metadata file is modeled as just that, a dependency on just the metadata and not the full build itself. Most of cargo's backend doesn't really need to know about this edge information so it's basically just calculated as we insert nodes into the `DependencyQueue`. Once that's all in place it's just a few pieces here and there to identify compilations that *can* be pipelined and then they're wired up to depend on the rmeta file instead of the rlib file.
This commit starts to lay the groundwork for rust-lang#6660 where Cargo will invoke rustc in a "pipelined" fashion. The goal here is to execute one command to produce both an `*.rmeta` file as well as an `*.rlib` file for candidate compilations. In that case if another rlib depends on that compilation, then it can start as soon as the `*.rmeta` is ready and not have to wait for the `*.rlib` compilation. Initially attempted in rust-lang#6864 with a pretty invasive refactoring this iteration is much more lightweight and fits much more cleanly into Cargo's backend. The approach taken here is to update the `DependencyQueue` structure to carry a piece of data on each dependency edge. This edge information represents the artifact that one node requires from another, and then we a node has no outgoing edges it's ready to build. A dependency on a metadata file is modeled as just that, a dependency on just the metadata and not the full build itself. Most of cargo's backend doesn't really need to know about this edge information so it's basically just calculated as we insert nodes into the `DependencyQueue`. Once that's all in place it's just a few pieces here and there to identify compilations that *can* be pipelined and then they're wired up to depend on the rmeta file instead of the rlib file.
Implement the Cargo half of pipelined compilation (take 2) This commit starts to lay the groundwork for #6660 where Cargo will invoke rustc in a "pipelined" fashion. The goal here is to execute one command to produce both an `*.rmeta` file as well as an `*.rlib` file for candidate compilations. In that case if another rlib depends on that compilation, then it can start as soon as the `*.rmeta` is ready and not have to wait for the `*.rlib` compilation. Initially attempted in #6864 with a pretty invasive refactoring this iteration is much more lightweight and fits much more cleanly into Cargo's backend. The approach taken here is to update the `DependencyQueue` structure to carry a piece of data on each dependency edge. This edge information represents the artifact that one node requires from another, and then we a node has no outgoing edges it's ready to build. A dependency on a metadata file is modeled as just that, a dependency on just the metadata and not the full build itself. Most of cargo's backend doesn't really need to know about this edge information so it's basically just calculated as we insert nodes into the `DependencyQueue`. Once that's all in place it's just a few pieces here and there to identify compilations that *can* be pipelined and then they're wired up to depend on the rmeta file instead of the rlib file. Closes #6660
This implements what I hope is 90% ish of the Cargo side of the work of #6660. The goal here is to eventually execute rlib compilations in a more "pipelined fashion" which involves taking more advantage of the parallelism that most machines have nowadays.
There is a lot of comments and wording on each commit and in documentation, so I won't repeat it all here too much!
We probably don't want to land this too eagerly, but it should in theory pass all tests and be ready to land at any time (just not bring any benefit yet). I'd be fine holding off until the rustc side of pipelining is implemented and then we can polish it off here and get this ready to go!