Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider integrating compilation pipelining #228

Open
GregBowyer opened this issue May 21, 2019 · 10 comments
Open

Consider integrating compilation pipelining #228

GregBowyer opened this issue May 21, 2019 · 10 comments

Comments

@GregBowyer
Copy link
Contributor

We dont have to do this right away, but as https://internals.rust-lang.org/t/evaluating-pipelined-rustc-compilation/10199 comes to fruit, we could plug this into bazel pretty easily.

@acmcarther
Copy link
Collaborator

The underlying notes are most relevant for reimplementing this:
https://github.com/rust-lang/compiler-team/blob/master/working-groups/pipelining/NOTES.md

I skimmed it briefly and it mostly made sense, but one snippet caught me off guard

An alternative solution proposed by @ehuss is that the compiler could print a message on stdout/stderr to Cargo whenever a file has been produced. Cargo already does this, for example, when invoked with --message-format=json. The compiler already emits errors as JSON blobs with --error-format=json, although the compiler doesn't emit other information via JSON right now.

Contrary to the "pro":

  • Should be somewhat usable by other build systems as it's pretty standard to listen to stderr/stdout from spawned processes.

I suspect that it might be more difficult from Bazel's perspective to initiate subsequent compilation steps based on messages on stdout.

@acmcarther
Copy link
Collaborator

@damienmg:

It looks like the implementation of compilation pipelining here works by having rustc inform the caller (via stdout) when artifacts are generated during execution. Cargo seems to have been updated to use that additional information to dispatch subsequent invocations to rustc while prior invocations are still in progress. My very superficial understanding of the internal model of Bazel tells me that this would be equivalent to an action becoming "partially complete" -- something not well supported by Bazel.

Do you know of any similar prior art where Bazel needed to work with tools which run for long durations and incrementally produce artifacts?


More detail:

Based on the document, the two kinds of artifacts being produced are

  1. rmeta: build metadata
  2. rlib: linkable libraries

(1) is required in order to initiate subsequent compilation steps and produce dependent rlibs, while (2) is required for final linking. The optimization at play takes advantage of the fact that (1) is generally much faster to produce than (2), and (2) is only required at the very end (IIUC).

@mfarrugi
Copy link
Collaborator

mfarrugi commented May 25, 2019

The two rustc invocation approach already mostly works, but currently --emit metadata by itself is not enough to produce the correct metadata.

There's some conversation around that @ https://rust-lang.zulipchat.com/#narrow/stream/195180-t-compiler.2Fwg-pipelining and rust-lang/rust#58465 (comment)

@mfarrugi
Copy link
Collaborator

Persistent Workers appears to be the Bazel feature closest to having multiple outputs from an action, but it's not a clean mapping and would probably be somewhat hacky to implement as it stands.

@UebelAndre
Copy link
Collaborator

I wonder if #1207 is related to the request here.

@jsgf
Copy link

jsgf commented Apr 17, 2022

@UebelAndre That's a strange (but interesting0 way to deal with this.

Ideally --emit metadata would produce hash-compatible .rmeta files to --emit link,metadata, but that is not the case today.

I found that rather than using --emit metadata, you can use --emit link -Zno-codegen to achieve the same effect - you get a .rlib, but it only contains metadata so it's functionally equivalent to a .rmeta (I refer to this as a "hollow rlib"). One has to be very careful to keep keep everything else constant so that the crate hash is kept consistent (and I had to put up PRs to fix a number of issues where changes to options were spuriously changing the hash), but they do end up having the same crate hash, so can be used interchangeably when you don't need the generated code (ie, as an input for a non-linking dependency).

I have successfully implemented pipelining in Buck v2 using this technique, and it works pretty well - it can unlock quite a lot of build-time concurrency, and it can take good advantage of pervasive caching. I believe Bazel has similar constraints to Buck, so I think it can use the same mechanism.

The main downside is the need to rely on the unstable -Zno-codegen option, so it would be nice to have a good set of use-cases to stabilize it.

@hlopko
Copy link
Member

hlopko commented Apr 27, 2022

Thanks @jsgf, that's super useful to know. We will definitely try both approaches and let you know what we found. If I had to guess I'd say that performance-wise both are equivalent, so the only real difference is that abort-after-metadata-is-produced approach is "stable" today. If that is the case, let's try to stabilize -Zno-codegen, since it's far simpler to support in Bazel.

@bjorn3
Copy link

bjorn3 commented Apr 29, 2022

The main downside is the need to rely on the unstable -Zno-codegen option, so it would be nice to have a good set of use-cases to stabilize it.

It also depends on macro expansion being deterministic. There are several proc macros which are not deterministic. For example they depend on hashmap iteration order or intentionally generate random numbers.

If that is the case, let's try to stabilize -Zno-codegen, since it's far simpler to support in Bazel.

I think we should rather have a -Cmetadata-for-codegen flag which defaults to no in case of only --emit metadata and yes in case of --emit link. So you would have -Cmetadata-for-codegen=yes --emit metadata for the initial compilation rather than -Zno-codegen --emit metadata,link.

@jsgf
Copy link

jsgf commented Apr 30, 2022

@bjorn3

-Cmetadata-for-codegen --emit metadata

Yeah, I'm fine with that so long as, you know, it actually works. Right now --emit affects the crate hash, so --emit metadata, --emit link and --emit link,metadata will all generate different hashes. The argument was that they are different outputs, so they should hash differently, and we should be conservative about changing the hash criteria because incremental is so error-prone. What you're proposing here is basically a special case where --emit metadata -Cmetadata-for-codegen will generate the same hash as --emit link, whereas plain --emit metadata will not.

I'll agree that -Zno-codegen --emit link is a definite hack, but it's a useful proof of concept, and conceptually simpler (do --emit link but leave out all that expensive codegen). It's functionally equivalent to the Bazel approach of just killing rustc once the metadata has been emitted.

It also depends on macro expansion being deterministic. There are several proc macros which are not deterministic. For example they depend on hashmap iteration order or intentionally generate random numbers.

That's already a given, independent of pipelining. The Buck and Bazel caching machinery assumes that Rust builds are deterministic (builds in general, but Rust actually checks its the case and fails if it isn't). Non-deterministic proc-macros are not all that common, but very painful when they turn up. Hash-order is a common example. Less common but harder to fix is "go off to some other data source and generate code from it" (eg, SQL schema). "Compile-time randomness" super annoying but we changed that to be deterministic ("random" number is a function of source file+line).

@googleson78
Copy link
Contributor

This is now implemented, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants