-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: run static and non-static Rust builds in parallel #1368
Conversation
There's a long tail of crate builds for each that can be mitigated by running them in parallel, saving a fair amount of time.
In about half of the CI runs, with no obvious pattern, this seems to have had a bad interaction with the build.rs logic for models that creates a needed symlink. I didn't see this locally. Setting the PR back to draft while I figure it out. [edit] In all 7 failure cases, the static build (which is now running earlier) failed to find the link created by models build.rs. It seems build.rs may be skipped every time in the static build (where it should only be skipped after the first time) and the static build hits that point before the non-static build can create the link, so it's failing. I'm going to check whether the CARGO_CFG_TARGET_VENDOR we're using to determine if we should run build.rs is different in static builds. [edit] That's basically it. For non-static runs I see vendor "bottlerocket" and "unknown", and for static it's only "bottlerocket". The build.rs exits early when it sees "bottlerocket" so we don't run it twice. It worked before because static always ran afterward, when the link already existed from non-static builds. Half of the builds succeeding is almost surely due to chance, i.e. the non-static builds racing to models build.rs first. |
This is a partial revert of 79465cd; the nicer structure for main() is kept, but the actual early-exit is removed. We don't actually need the early-exit if it's safe to run this build.rs in parallel, which it is since 3943a32 introduced safe link swapping. The other action it takes, README generation, is skipped during production builds, and would be obvious if it went wrong during local builds due to version controlled changes, though it should also be safe there as long as our READMEs don't balloon and become unsafe to write in parallel. The reason for removing this is that it effectively worked by chance, because we were running non-static builds before static builds. We'd like to run both sets of builds in parallel to save a significant amount of time. Static builds, as run in os.spec, only run build.rs once, with a vendor of "bottlerocket". This means build.rs was skipped, and unless non-static builds won the race and ran their build.rs before static builds got to that crate, the link wouldn't be created and the build would fail.
^ I added a commit that addresses the build issue mentioned above.
I tested locally to confirm that build.rs was run for the static build. [edit] CI passed all variants. Going to rerun it all to get a little more confidence. |
Passed two full CI runs after the fix above, so I'm setting this as ready to review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
📸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
A basic question, I am trying to understand what makes cargo build run sequential vs parallel. Does %{nil}
at the end of %cargo_build_static
command makes cargo build sequential and wait for build to complete ?
@srgothi92 The What made it sequential was that they were shell commands run one after the other, without either of them running in the background or anything. With this change we're explicitly asking the shell to run the first job in the background with |
Ahh, I see so |
Description of changes:
This saves ~1.5 minutes on every build! (On a fast machine.) 🎉
(Parallel cargo has been safe for a while now, for reference.)
Other options considered:
Testing done:
With the change, I see static and non-static rustc processes at the same time. If I
touch sources/**/*.rs
, building theos
package is pretty consistent at 4 minutes 18 seconds.Without the change, I see non-static and then static processes serially, and it's pretty consistent at around 5 minutes 48 seconds.
I tested the failure cases by adding a nonexistent crate name to the build list for static and non-static (in separate runs). With a fake package in non-static list: quick failure, clear error. With a fake package in static list: still get a clear error, though after a few minutes, when the non-static build has finished. (Since static builds used to run afterward, it's about the same delay.)
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.