Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix off-by-one-error in stage numbering between rustc-guide and rustbuild #57963

Closed
dwijnand opened this issue Jan 29, 2019 · 21 comments
Closed
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@dwijnand
Copy link
Member

dwijnand commented Jan 29, 2019

The rustc guide states that:

  • Stage 0: the stage0 compiler is usually the current beta compiler
    (x.py will download it for you); you can configure x.py to use something
    else, though.
  • Stage 1: the code in your clone (for new version) is then
    compiled with the stage0 compiler to produce the stage1 compiler.

However a run of rustbuild like so ./x.py build --stage 0 outputs:

Building stage0 std artifacts (x86_64-apple-darwin -> x86_64-apple-darwin)
Building stage0 test artifacts (x86_64-apple-darwin -> x86_64-apple-darwin)
Building stage0 compiler artifacts (x86_64-apple-darwin -> x86_64-apple-darwin)
[etc]

Personally I think the guide makes more sense. However fixing rustbuild would mean that --stage 1 would become --stage 2 (and so forth).

@dwijnand
Copy link
Member Author

(I think this misunderstanding has been at the root of all my rustc feedback loop problems, so I'd be happy to resolve it or attempt to.)

@Mark-Simulacrum
Copy link
Member

FWIW, the stages printed in rustc output aren't wrong -- the stage0 std artifacts there are going to be used for anything built by the stage0 compiler: for example, rustc, test artifacts. This pattern continues in latter stages.

Specifically, "Stage 0: the stage0 compiler is usually the current beta compiler" is true, but that's (mostly, modulo build scripts in stage0 std compilation) only true for the rustc binary itself (and associated dynamic libraries). The compiler we download is never used for anything beyond compiling rustbuild, std, test, and rustc, but when we're compiling test and rustc, that compiler is using the freshly produced std.

I'm not sure if any of that made sense -- I myself still struggle with this pretty much every time I confront any sort of staging issue or work with rustbuild. If you have suggestions, I'd love to hear them; I think the problem is that there are two concepts at play here: a compiler (with its set of dependencies) and it's "target" libraries (std, test, and rustc-ish). Both are staged, but in sort of a staggered manner. That makes talking about any of this quite hard.

@Centril Centril changed the title Fix OBOE in stage numbering between rustc-guide and rustbuild Fix off-by-one-error in stage numbering between rustc-guide and rustbuild Jan 29, 2019
@Centril
Copy link
Contributor

Centril commented Jan 29, 2019

I also found the situation quite confusing.

@Mark-Simulacrum Perhaps it would be easier to understand and give words to things if we had a sort of sequence/activity diagram or flowchart for stages and dependencies that exist. A picture says...

@Centril Centril added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Jan 29, 2019
@ehuss
Copy link
Contributor

ehuss commented Jan 29, 2019

I have found it to be confusing, too. I had some notes lying around about how it works. Perhaps one way to think of it is:

stage 0 uses stage 0 compiler to create stage 0 artifacts which will later be uplifted to stage1

That's a bit convoluted. I would expect it to be more along the lines of "stage 0 builds stage 1", but alas it is not.

It's also confusing because building HOST std and TARGET std are different based on the stage (notice below how stage2 only builds non-host std targets — I don't know why). And --keep-stage still seems a bit confusing to me.

Stage 0 Action Output
beta extracted build/HOST/stage0
stage0(beta) builds bootstrap build/bootstrap
stage0(beta) builds libstd build/HOST/stage0-std/TARGET
copy stage0-std (HOST only) build/HOST/stage0-sysroot/lib/rustlib/HOST
stage0(beta) (sysroot stage0-sysroot) builds libtest build/HOST/stage0-test/TARGET
copy stage0-test (HOST only) build/HOST/stage0-sysroot/lib/rustlib/HOST
stage0(beta) (sysroot stage0-sysroot) builds rustc build/HOST/stage0-rustc/HOST
copy stage0-rustc (except executable) build/HOST/stage0-sysroot/lib/rustlib/HOST
build llvm build/HOST/llvm
stage0(beta) (sysroot stage0-sysroot) builds codegen build/HOST/stage0-codgen/HOST
stage0(beta) (sysroot stage0-sysroot) builds rustdoc build/HOST/stage0-tools/HOST

--stage=0 stops here

Stage 1 Action Output
copy (uplift) stage0-rustc executable build/HOST/stage1/bin
copy (uplift) stage0-sysroot build/HOST/stage1/lib
stage1 (sysroot stage1) builds libstd build/HOST/stage1-std/TARGET
copy stage1-std (HOST only) build/HOST/stage1/lib/rustlib/HOST
stage1 (sysroot stage1) builds libtest build/HOST/stage1-test/TARGET
copy stage1-test (HOST only) build/HOST/stage1/lib/rustlib/HOST
stage1 (sysroot stage1) builds rustc build/HOST/stage1-rustc/HOST
copy stage1-rustc (except executable) build/HOST/stage1/lib/rustlib/HOST
stage1 (sysroot stage1) builds codegen build/HOST/stage1-codegen/HOST

--stage=1 stops here

Stage 2 Action Output
copy (uplift) stage1-rustc executable build/HOST/stage2/bin
copy (uplift) stage1-sysroot build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST
stage2 (sysroot stage2) builds libstd (except HOST?) build/HOST/stage2-std/TARGET
copy stage2-std (not HOST targets) build/HOST/stage2/lib/rustlib/TARGET
stage2 (sysroot stage2) builds libtest (except HOST?) build/HOST/stage2-test/TARGET
copy stage2-test (not HOST targets) build/HOST/stage2/lib/rustlib/TARGET
stage2 (sysroot stage2) builds rustdoc build/HOST/stage2-tools/HOST
copy rustdoc build/HOST/stage2/bin

--stage=2 stops here

Notes:

  • Build scripts always use stage0(beta) and stage0(beta) sysroot (not stage0-sysroot).
  • This does not include optional things like lld, stuff required for tests, etc.

Maybe adding a little extra detail to rustc-guide would help? Does any of that help, or is it more confusing now?

@dwijnand
Copy link
Member Author

Thank you all for your comments.

I understand now that stage0 is a full-fledged stage and no one feels like that's wrong, which is a good thing because it means we just need to fix the guide not change the semantics of rustc guide.

I think there are several misleading things in that section of the rustc guide:

  • "the stage0 compiler is usually the current beta compiler" is ambiguous about whether "the stage0 compiler" means "the compiler built at stage0" or "the compiler used to build stage0 [things]"
  • "Stage 1: the code in your clone (for new version) is then compiled" implies (to me) that the code in my clone isn't compiled in stage 0

I'll come back to this use your info and insights to propose some changes to the rustc-guide repo.

@dwijnand
Copy link
Member Author

dwijnand commented Jan 29, 2019

Here's another part that has me doubting everything I think I know about how rustc is built, the guide says:

In particular, the newer version of the compiler, libstd, and other tooling may use some unstable features internally.

and in a few more places it makes reference to libstd being the compiler. Is that true?? I thought libstd was the standard library. Isn't src/rustc the new compiler? Or maybe for consumption by rustup or for rustc's test suite it's something a step down, like src/librustc_driver or so.

@dwijnand
Copy link
Member Author

dwijnand commented Jan 29, 2019

Also, in https://rust-lang.github.io/rustc-guide/how-to-build-and-run.html#workflow about the commend ./x.py build -i --stage 1 src/libstd --keep-stage 1 it says:

The effect of --keep-stage 1 is that we just assume that the old standard library can be re-used.

Which "old standard library" does it mean there, stage 0 or stage 1? Also, it's confusing to target stage 1 and keep stage 1. Assuming it's not equivalent to ./x.py build -i --stage 0 src/rustc, I think the guide should detail why it's not.

@jonas-schievink
Copy link
Contributor

@dwijnand the first bit is just a bit confusingly worded. rustc is the compiler and libstd is the standard library (which includes libcore, liballoc, etc.). It just means "rustc, and libstd, and other tooling may use unstable features internally".

@dwijnand
Copy link
Member Author

@jonas-schievink I'm not sure. You're right, that could be the interpretation, but in https://rust-lang.github.io/rustc-guide/how-to-build-and-run.html#build-flags it says from running ./x.py build -i --stage 1 src/libstd:

This final product (stage1 compiler + libs built using that compiler) is what you need to build other rust programs.

And again in https://rust-lang.github.io/rustc-guide/how-to-build-and-run.html#workflow:

  • Initial build: ./x.py build -i --stage 1 src/libstd
    • As documented above, this will build a functional stage1 compiler

@jonas-schievink
Copy link
Contributor

Yes, you need both the Rust compiler rustc and a libstd compatible with that compiler in order to build Rust programs (unless you use #![no_std] or #![no_core]). The second thing you linked is pretty misleading though - passing std/libstd to x.py won't just build a compiler, AFAIK it will perform all "Stage 0" actions described in @ehuss' comment above, which builds the stage 1 rustc, and the first few steps of the "Stage 1" actions up to "stage1 (sysroot stage1) builds libstd" (and perhaps the copy operation following that?).

...this really is hairy

@mark-i-m
Copy link
Member

A PR to the rustc guide would be much appreciated. A better naming scheme in the build system would be helpful too: if it is built with stage N, it should be called stage N+1, even if that stage contains only a libstd or only a compiler...

@dwijnand
Copy link
Member Author

@jonas-schievink I'm concerned that it's instructing to compile stage 1 libstd just so it does the two steps before: copy rustc and sysroot into build/HOST/stage1/bin and build/HOST/stage1/lib... That would be very wasteful.

@mark-i-m

A better naming scheme in the build system would be helpful too: if it is built with stage N, it should be called stage N+1, even if that stage contains only a libstd or only a compiler...

Yeah, perhaps that can be achieved without any or too much disruption. I'll try and send a PR soon fixing some of the explanation and probably including Eric's breakdown.

@dwijnand
Copy link
Member Author

Actually... @jonas-schievink

you need both the Rust compiler rustc and a libstd compatible with that compiler in order to build Rust programs (unless you use #![no_std] or #![no_core]).

At stage 0 beta rust compiles first libstd then rustc. Are those two artefacts compatible with one another?

@jonas-schievink
Copy link
Contributor

(take everything I say with a grain of salt, I'm not really sure how the entire bootstrap process works)

At stage 0 beta rust compiles first libstd then rustc. Are those two artefacts compatible with one another?

I think this step is needed to ensure the compiler is always built against its own libstd, not the libstd shipped with beta (downloaded as stage0). This is where the #[cfg(stage0)] annotations in libstd are used, since they make libstd build on the beta compiler. The resulting libstd is compatible with the beta compiler.

The compiler is then built and links against this libstd we just built. Now we want to build libstd again, using that compiler, which now has all features that libstd might want, so we can turn off #[cfg(stage0)]. The libstd created by that is up-to-date and compatible with the built compiler (as in, you can use the resulting rustc+libstd pair to build any Rust program using the latest Rust features).

(I'm not sure how the process around proc macros or compiler plugins works, I think those have a few more caveats)

@dwijnand
Copy link
Member Author

Thanks, @jonas-schievink, your explanation is looking more and more likely.

The question I have about that being how it works is: if a proper, usable distribution requires a fresh compiler and a fresh library built with that compiler, why is stage0 first libstd and then rustc? It's like staging is delimited by compiler builds which is only half of a distribution build. Why not use beta rustc + beta std to build the first iteration of the distribution?

In a proper release of Rust, what's at the tail of the build? Is there a final-final-final compiler that build libstd and together they ship?

@ehuss
Copy link
Contributor

ehuss commented Jan 29, 2019

In a proper release of Rust, what's at the tail of the build? Is there a final-final-final compiler that build libstd and together they ship?

I believe the compiler and libs that ends up in the "stage2" directory is what is typically distributed.

I made a simplified diagram of what I wrote above: https://gist.github.com/ehuss/e40c18e1678fec0aa5861fd0d1653a87

@jonas-schievink
Copy link
Contributor

if a proper, usable distribution requires a fresh compiler and a fresh library built with that compiler, why is stage0 first libstd and then rustc?

I'm not entirely sure, but I think this is because it's convenient if we only have to make rustc build against the libstd inside the same repository instead of having to make it compatible with the downloaded beta libstd.

The libstd in this repo often has quite a few #[cfg(stage0)] annotations that make it build against the beta libstd. With this setup, we don't need those for rustc as well.

Or I'm completely wrong, we do sometimes need #[cfg(stage0)] in rustc and the first libstd build is unnecessary or done for other complicated reasons.

@Mark-Simulacrum
Copy link
Member

I'm just going to leave a semi-quick comment here (well, it turned out longer than expected) -- I want to more fully flesh out this documentation, unfortunately I just don't have the time right now; probably after the all hands I will have the time to respond more thoroughly. I'm hopeful that we can get some good documentation/understanding here for all parties though!

I've excerpted a few bits from above with some answers, hopefully this at least helps for now:

notice below how stage2 only builds non-host std targets — I don't know why (#57963 (comment))

This is because during stage 2, the host std is uplifted from the "stage 1" std -- specifically, when you see "Building stage 1 artifacts" that's later copied into stage 2 as well (both the compiler's libdir and the sysroot).

I'm concerned that it's instructing to compile stage 1 libstd just so it does the two steps before: copy rustc and sysroot into build/HOST/stage1/bin and build/HOST/stage1/lib... That would be very wasteful. #57963 (comment)

This is not wasteful -- that std is pretty much necessary for any useful work with the compiler. Specifically, it's used as the std for programs compiled by that compiler (so when you compile fn main() { } that links to the std compiled last with x.py build --stage 1 src/libstd).

A better naming scheme in the build system would be helpful too: if it is built with stage N, it should be called stage N+1, even if that stage contains only a libstd or only a compiler...

Yeah, perhaps that can be achieved without any or too much disruption. I'll try and send a PR soon fixing some of the explanation and probably including Eric's breakdown. #57963 (comment)

I've considered doing this in the past but it's actually somewhat misleading. Every time we compile any of the main artifacts ("std", "test", "rustc") we're actually performing two steps. I'll call the compiler which compiles these libraries A and the 'next' compiler B. When we compile std, for example, that std will be linked to programs built by A (including test and rustc built later on). It will also be used for compiler B to link against itself. This is somewhat intuitive if you think of compiler B as "just" a program that we're building with A. In some ways, rustc (the binary, not the rustbuild step) could be thought of as one of the only no_core binaries out there.

At stage 0 beta rust compiles first libstd then rustc. Are those two artefacts compatible with one another? #57963 (comment)

The rustc that's built is linked to the freshly-built libstd. So not only are they compatible, that's the only way rustc is guaranteed to build. As @jonas-schievink discusses in #57963 (comment), "the compiler is always built against its own libstd" is the correct reason for this. That means that for the most part only std needs to be cfg-gated. This means that rustc can use features added to std immediately after their addition, there's no need to wait until they get into beta.

However, in "The libstd created by that is up-to-date and compatible with the built compiler (as in, you can use the resulting rustc+libstd pair to build any Rust program using the latest Rust features)." (#57963 (comment)), that's perhaps a bit misleading. The libstd built by the stage1/bin/rustc compiler, also known as stage 1 std artifacts, is not necessarily ABI-compatible with that compiler. That is, the rustc binary most likely could not use this std itself. It is however ABI-compatible with any programs that the stage1/bin/rustc binary builds (including itself), so in that sense they're paired.

Perhaps worth noting -- when I say "ABI" I most likely mean more than that. I've not received any concrete answers from rustc devs as to what actually needs to stay the same -- but loosely I believe this is broadly metadata encoding and the ABI itself.

This is also where --keep-stage 1 src/libstd comes into play. Because most changes to the compiler don't actually change the ABI, once you've produced a libstd in stage 1, you can probably just reuse it with a different compiler. If the ABI hasn't changed, you're good to go, no need to spend the time recompiling that std. --keep-stage simply assumes the previous compile is fine and copies those artifacts into the appropriate place, skipping the cargo invocation.

The question I have about that being how it works is: if a proper, usable distribution requires a fresh compiler and a fresh library built with that compiler, why is stage0 first libstd and then rustc? It's like staging is delimited by compiler builds which is only half of a distribution build. Why not use beta rustc + beta std to build the first iteration of the distribution? #57963 (comment)

In a proper release of Rust, what's at the tail of the build? Is there a final-final-final compiler that build libstd and together they ship?

The reason we first build std, then test, then rustc, is largely just because we want to minimize cfg(stage0) in the code for rustc. Currently rustc is always linked against a "new" std/test so it doesn't ever need to be concerned with differences in std; it can assume that the std is as fresh as possible.

The reason we need to build twice is because of ABI compatibility. The beta compiler has it's own ABI, then the stage1/bin/rustc compiler will produce programs/libraries with the new ABI. We actually used to build three times, but because we assume that the ABI is constant within a codebase, we presume that the libraries produced by the "stage2" compiler (produced by the stage1/bin/rustc compiler) is ABI-compatible with the stage1/bin/rustc compiler's produced libraries. What this means is that we can skip that final compilation -- and simply use the same libraries as the stage2/bin/rustc compiler uses itself for programs it links against.

This stage2/bin/rustc compiler is shipped to end-users, along with the "stage 1 {std,test,rustc}" artifacts.


I hope this was helpful -- I'll try to make some time to leave another response tomorrow, maybe even put together that graphic @Centril suggested, but not sure if that'll happen yet.

@dwijnand
Copy link
Member Author

Thank you, @Mark-Simulacrum, for that. I must confess that while I did understand parts of it, large parts flew over my head. 😕 However...

I want to more fully flesh out this documentation, unfortunately I just don't have the time right now; probably after the all hands I will have the time to respond more thoroughly. I'm hopeful that we can get some good documentation/understanding here for all parties though!

I'll hold off on making any changes myself, then. If you send changes to the rustc-guide I could help review them and perhaps add some of my own.

@jyn514
Copy link
Member

jyn514 commented Sep 1, 2020

I have a proposal for explaining this better in rust-lang/rustc-dev-guide#843. Please let me know if that helps at all, and if not, what I could do to make it better :)

@jyn514
Copy link
Member

jyn514 commented Jan 1, 2021

I don't think the model of bootstrapping is going to change - I tried that in rust-lang/rustc-dev-guide#843 and there was a lot of resistance to having to change mental models or the arguments to pass to x.py. So I think that means this is a duplicate of #59864 and it doesn't make sense to track this in both places. Feel free to re-open if you disagree :)

@jyn514 jyn514 closed this as completed Jan 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants