Consider integrating compilation pipelining #228

GregBowyer · 2019-05-21T03:47:16Z

We dont have to do this right away, but as https://internals.rust-lang.org/t/evaluating-pipelined-rustc-compilation/10199 comes to fruit, we could plug this into bazel pretty easily.

acmcarther · 2019-05-21T17:23:09Z

The underlying notes are most relevant for reimplementing this:
https://github.com/rust-lang/compiler-team/blob/master/working-groups/pipelining/NOTES.md

I skimmed it briefly and it mostly made sense, but one snippet caught me off guard

An alternative solution proposed by @ehuss is that the compiler could print a message on stdout/stderr to Cargo whenever a file has been produced. Cargo already does this, for example, when invoked with --message-format=json. The compiler already emits errors as JSON blobs with --error-format=json, although the compiler doesn't emit other information via JSON right now.

Contrary to the "pro":

Should be somewhat usable by other build systems as it's pretty standard to listen to stderr/stdout from spawned processes.

I suspect that it might be more difficult from Bazel's perspective to initiate subsequent compilation steps based on messages on stdout.

acmcarther · 2019-05-21T18:20:35Z

@damienmg:

It looks like the implementation of compilation pipelining here works by having rustc inform the caller (via stdout) when artifacts are generated during execution. Cargo seems to have been updated to use that additional information to dispatch subsequent invocations to rustc while prior invocations are still in progress. My very superficial understanding of the internal model of Bazel tells me that this would be equivalent to an action becoming "partially complete" -- something not well supported by Bazel.

Do you know of any similar prior art where Bazel needed to work with tools which run for long durations and incrementally produce artifacts?

More detail:

Based on the document, the two kinds of artifacts being produced are

rmeta: build metadata
rlib: linkable libraries

(1) is required in order to initiate subsequent compilation steps and produce dependent rlibs, while (2) is required for final linking. The optimization at play takes advantage of the fact that (1) is generally much faster to produce than (2), and (2) is only required at the very end (IIUC).

mfarrugi · 2019-05-25T00:45:34Z

The two rustc invocation approach already mostly works, but currently --emit metadata by itself is not enough to produce the correct metadata.

There's some conversation around that @ https://rust-lang.zulipchat.com/#narrow/stream/195180-t-compiler.2Fwg-pipelining and rust-lang/rust#58465 (comment)

mfarrugi · 2019-05-28T21:41:58Z

Persistent Workers appears to be the Bazel feature closest to having multiple outputs from an action, but it's not a clean mapping and would probably be somewhat hacky to implement as it stands.

UebelAndre · 2022-04-13T16:54:16Z

I wonder if #1207 is related to the request here.

jsgf · 2022-04-17T02:14:30Z

@UebelAndre That's a strange (but interesting0 way to deal with this.

Ideally --emit metadata would produce hash-compatible .rmeta files to --emit link,metadata, but that is not the case today.

I found that rather than using --emit metadata, you can use --emit link -Zno-codegen to achieve the same effect - you get a .rlib, but it only contains metadata so it's functionally equivalent to a .rmeta (I refer to this as a "hollow rlib"). One has to be very careful to keep keep everything else constant so that the crate hash is kept consistent (and I had to put up PRs to fix a number of issues where changes to options were spuriously changing the hash), but they do end up having the same crate hash, so can be used interchangeably when you don't need the generated code (ie, as an input for a non-linking dependency).

I have successfully implemented pipelining in Buck v2 using this technique, and it works pretty well - it can unlock quite a lot of build-time concurrency, and it can take good advantage of pervasive caching. I believe Bazel has similar constraints to Buck, so I think it can use the same mechanism.

The main downside is the need to rely on the unstable -Zno-codegen option, so it would be nice to have a good set of use-cases to stabilize it.

hlopko · 2022-04-27T08:38:45Z

Thanks @jsgf, that's super useful to know. We will definitely try both approaches and let you know what we found. If I had to guess I'd say that performance-wise both are equivalent, so the only real difference is that abort-after-metadata-is-produced approach is "stable" today. If that is the case, let's try to stabilize -Zno-codegen, since it's far simpler to support in Bazel.

bjorn3 · 2022-04-29T14:18:52Z

The main downside is the need to rely on the unstable -Zno-codegen option, so it would be nice to have a good set of use-cases to stabilize it.

It also depends on macro expansion being deterministic. There are several proc macros which are not deterministic. For example they depend on hashmap iteration order or intentionally generate random numbers.

If that is the case, let's try to stabilize -Zno-codegen, since it's far simpler to support in Bazel.

I think we should rather have a -Cmetadata-for-codegen flag which defaults to no in case of only --emit metadata and yes in case of --emit link. So you would have -Cmetadata-for-codegen=yes --emit metadata for the initial compilation rather than -Zno-codegen --emit metadata,link.

jsgf · 2022-04-30T02:31:56Z

@bjorn3

-Cmetadata-for-codegen --emit metadata

Yeah, I'm fine with that so long as, you know, it actually works. Right now --emit affects the crate hash, so --emit metadata, --emit link and --emit link,metadata will all generate different hashes. The argument was that they are different outputs, so they should hash differently, and we should be conservative about changing the hash criteria because incremental is so error-prone. What you're proposing here is basically a special case where --emit metadata -Cmetadata-for-codegen will generate the same hash as --emit link, whereas plain --emit metadata will not.

I'll agree that -Zno-codegen --emit link is a definite hack, but it's a useful proof of concept, and conceptually simpler (do --emit link but leave out all that expensive codegen). It's functionally equivalent to the Bazel approach of just killing rustc once the metadata has been emitted.

It also depends on macro expansion being deterministic. There are several proc macros which are not deterministic. For example they depend on hashmap iteration order or intentionally generate random numbers.

That's already a given, independent of pipelining. The Buck and Bazel caching machinery assumes that Rust builds are deterministic (builds in general, but Rust actually checks its the case and fails if it isn't). Non-deterministic proc-macros are not all that common, but very painful when they turn up. Hash-order is a common example. Less common but harder to fix is "go off to some other data source and generate code from it" (eg, SQL schema). "Compile-time randomness" super annoying but we changed that to be deterministic ("random" number is a function of source file+line).

googleson78 · 2022-11-15T10:52:59Z

This is now implemented, right?

cramertj · 2024-12-17T18:46:56Z

Yes, this was done in #1275

mfarrugi mentioned this issue Sep 21, 2020

Persistent worker for Rust rules #412

Open

UebelAndre added the enhancement label Apr 13, 2022

jsgf mentioned this issue Apr 17, 2022

process_wrapper: add support for terminating rustc after it emits rmeta. #1207

Merged

dtolnay mentioned this issue Apr 20, 2022

rules_rust: enable pipelined compilation. #1275

Merged

bjorn3 mentioned this issue May 4, 2022

Remove dependency formats from crate metadata rust-lang/rust#96708

Open

aherrmann mentioned this issue Aug 31, 2022

Consider support for pipelined builds tweag/rules_haskell#1812

Open

illicitonion closed this as completed Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider integrating compilation pipelining #228

Consider integrating compilation pipelining #228

GregBowyer commented May 21, 2019

acmcarther commented May 21, 2019

acmcarther commented May 21, 2019

mfarrugi commented May 25, 2019 •

edited

Loading

mfarrugi commented May 28, 2019

UebelAndre commented Apr 13, 2022

jsgf commented Apr 17, 2022 •

edited

Loading

hlopko commented Apr 27, 2022

bjorn3 commented Apr 29, 2022

jsgf commented Apr 30, 2022 •

edited

Loading

googleson78 commented Nov 15, 2022

cramertj commented Dec 17, 2024

Consider integrating compilation pipelining #228

Consider integrating compilation pipelining #228

Comments

GregBowyer commented May 21, 2019

acmcarther commented May 21, 2019

acmcarther commented May 21, 2019

mfarrugi commented May 25, 2019 • edited Loading

mfarrugi commented May 28, 2019

UebelAndre commented Apr 13, 2022

jsgf commented Apr 17, 2022 • edited Loading

hlopko commented Apr 27, 2022

bjorn3 commented Apr 29, 2022

jsgf commented Apr 30, 2022 • edited Loading

googleson78 commented Nov 15, 2022

cramertj commented Dec 17, 2024

mfarrugi commented May 25, 2019 •

edited

Loading

jsgf commented Apr 17, 2022 •

edited

Loading

jsgf commented Apr 30, 2022 •

edited

Loading