metadata-link: add -Zmetadata-link flag #85793

jsgf · 2021-05-28T20:18:36Z

This adds the -Zmetadata-link flag, whose intent is to make the output
of --emit metadata equivalent to --emit metadata,link. The goal is
to allow pipelined builds with separate invocations of rustc. This is
desireable for two reasons:

The "artifact notification" system that rustc and cargo use to
communicate is very cargo-specific - it doesn't fit well into other
builds systems such as Buck or Bazel. In general its incompatible with
any build system which only recognizes output artifacts once the
compiler process has terminated. This is a particular problem when
compilation is distributed.
The rmeta file could be cached independently from the rlib. For
example, if one generates compilation-ready rmeta files as part
of "check" build, then those can be directly used for a full
compilation from cache, without having to regenerate them. This
means the build turns into a highly parallelizable invocation of
rustc --emit link commands using cached rmeta files as input.

Initially this flag -Zmetadata-link, but ultimately I'd expect to
stabilize it as -Cmetadata-link (or a better name).

There's some outstanding problems/questions:

No tests. It's not clear to me that there are existing tests which cover the pipelined
build mode specifically; src/tests/ui/rmeta somewhat touches on it. Added tests for
plain pipelined builds and with -Zmetadata-link.
The generated rmetas from --emit metadata -Zmetadata-link are much larger
than those from --emit metadata,link and I don't know why. While they seem to work
in local ad-hoc testing, I'm concerned they may have different performance/compilation time/something else
problems. I'm not sure what I was seeing before, but this looks fine - they're within a couple of bytes of each other.
This is very similar to -Zalways-encode-mir but I think they're logically distinct (eg I see
this new flag as being on stabilization track).
- @bjorn3 suggests this might be equivalent to --emit metadata,link -Zno-codegen which
  sounds plausible but quite roundabout.
Another approach might be to make --emit metadata always work this way, and maybe
add --emit check which emits a subset of rmeta for check builds. This would either mean
that there could be a mixture of different kinds of rmeta files which are usable for different uses
(ie, the confusion of two different --emit options producing files with the same name but different
content. Alternatively --emit check could generate (say) .rcheck files, but that adds complexity
to locating files, since we'd end up with a 3 level heirarchy of rlib > rmeta > rcheck in terms of
what the files are useful for. I experimented with this briefly, but its effects ended up being overwhelmingly
sprawling and complex (eg affecting Cargo as well).
-Zmetadata-link is much more narrowly scoped and opt-in.

rust-highfive · 2021-05-28T20:18:39Z

r? @varkor

(rust-highfive has picked a reviewer for you, use r? to override)

jsgf · 2021-05-28T20:19:57Z

cc @oli-obk because @RalfJung mentioned you've got context for this.

bjorn3 · 2021-05-28T20:30:16Z

compiler/rustc_codegen_ssa/src/back/write.rs

@@ -1037,7 +1037,8 @@ fn start_executing_work<B: ExtraBackendMethods>(
    }));

    let ol = if tcx.sess.opts.debugging_opts.no_codegen
-        || !tcx.sess.opts.output_types.should_codegen()
+        || (!tcx.sess.opts.output_types.should_codegen()
+            && !tcx.sess.opts.debugging_opts.metadata_link)


This shouldn't be necessary. This is only relevant for the actual codegen, not for encoding metadata.

ehuss · 2021-05-28T22:33:05Z

Can you clarify how this would work? Like how would be the sequence of rustc calls that a build system would use? It's not clear to me how it would run rustc -Zmetadata-link to generate the rmeta, and then run rustc again to generate the rlib. Wouldn't the second call to rustc redo all the work of the first one? Or are you considering that an acceptable cost? Or are you relying on incremental?

jsgf · 2021-05-28T23:47:05Z

@ehuss It's exactly analogous to pipelined builds in Cargo, except that rustc is invoked on each crate twice (at least those crates which have downstream non-linking dependents). The front-end parsing/typecheck/etc work is duplicated, but the (presumably more expensive) llvm codegen is not.

The goal is the shorten the critical path for the dependencies by using metadata-only builds, so only the final linking target (ie executable) is dependent on codegen in order to increase build parallelism. In this scenario we can assume there's unlimited computing resources, so we're trying to remove as coupling in the dependency graph as possible to unlock parallelism and reduce the overall build latency.

So if you have the simple case of main depends on bar depends on foo, without pipelining, we'd need to wait for libfoo.rlib to compile before starting on libbar.rlib, and then main. But with this PR we can generate libfoo.rmeta with low latency, and in parallel generate libfoo.rlib and libbar.rlib (which only depends on libfoo.rmeta), before going on to main.

A: rustc --emit metadata -Zmetadata-link --crate-type rlib foo.rs & # generate libfoo.rmeta
B: rustc --emit metadata -Zmetadata-link --crate-type rlib bar.rs --extern foo=libfoo.rmeta & # generate libbar.rmeta for dependencies (not needed here)
C: wait A; rustc --emit link --crate-type rlib bar.rs --extern foo=libfoo.rmeta & # generate libbar.rlib for linking
D: rustc --emit link --crate-type rlib foo.rs & # libfoo.rlib (could be started at the same time as A)
E: wait C D; rustc --emit link --crate-type bin main.rs --extern bar=libbar.rlib -Ldependency=. # final executable

If we already have cached artifacts for any of these steps - say from a previous check build (assuming check uses -Zmetadata-link for this purpose) - then we can just skip the step and use the artifact from cache. (And conversely a check build can use metadata generated by one of these build steps.)

(Ideally we could decouple the compilation of main from linking, but that's a whole other conversation.)

I have a prototype of this in a build system, but rustc doesn't currently support it without this PR. I originally tried using this with -Zalways-encode-mir but that's not equivalent to --emit metadata,link and it didn't work (#85401).

I'm curious to see how well this works, especially with the decoupled caching for rlib and rmeta.

bjorn3 · 2021-05-29T10:27:16Z

(Ideally we could decouple the compilation of main from linking, but that's a whole other conversation.)

That would be -Zno-link/-Zlink-only.

jsgf · 2021-05-29T15:55:24Z

(Ideally we could decouple the compilation of main from linking, but that's a whole other conversation.)

That would be -Zno-link/-Zlink-only.

Yeah, as I said, a whole other conversation. Unless they've changed since I last looked at them, they're not (quite) suitable for what I have in mind.

bjorn3 · 2021-05-29T20:27:27Z

Wait a minute. Isn't this supposed to be equivalent to whaf --emit metadata,link in combination with -Zno-codegen is?

jsgf · 2021-05-29T21:12:04Z

@bjorn3 I'll check it out but it sounds like a roundabout way of getting the desired results.

Edit: Hm, --emit metadata,link -Zno-codegen still emits a .rlib file which only contains the .rmeta. That seems like reasonable behaviour given what's requested of the compiler. It gets in the way and can be confused with the "real" r.lib, but I guess it can be discarded with --emit metadata,link=/dev/null or something.

Edit2: Looking at -Ztime-passes, using -Zno-codegen and -Zmetadata-link are more or less the same, except the former still has the link_rlib pass. It looks like that's due to:

rust/compiler/rustc_codegen_ssa/src/back/link.rs

Line 59 in ff5522f

&& !output_metadata

jsgf · 2021-05-29T21:12:52Z

Tests added.

jsgf · 2021-05-30T05:34:52Z

@bjorn3 Actually after further testing, I don't think -Zno-codegen does the job. Because link is present, the compiler is requiring dependencies to be specified as rlib (ie "crate foo required to be available in rlib format, but was not found in this form"), which undermines the whole point - the goal is to make the all the dependencies rmeta rather than rlib.

Given -Zno-codegen's (not quite accurate) description "run all passes except codegen; no output", it doesn't seem like it should change how location resolution works.

RalfJung · 2021-05-30T12:21:33Z

Cc @hyd-dev who also looked into issues around should_codegen() for Miri recently

This makes sure that building a crate with an rmeta dependency is viable as a linkable crate.

This adds the -Zmetadata-link flag, whose intent is to make the output of `--emit metadata` equivalent to `--emit metadata,link`. The goal is to allow pipelined builds with separate invocations of rustc. This is desireable for two reasons: 1. The "artifact notification" system that rustc and cargo use to communicate is very cargo-specific - it doesn't fit well into other builds systems such as Buck or Bazel. In general its incompatible with any build system which only recognizes output artifacts once the compiler process has terminated. This is a particular problem when compilation is distributed. 2. The rmeta file could be cached independently from the rlib. For example, if one generates compilation-ready rmeta files as part of "check" build, then those can be directly used for a full compilation from cache, without having to regenerate them. This means the build turns into a highly parallelizable invocation of `rustc --emit link` commands using cached `rmeta` files as input. Initially this flag `-Zmetadata-link`, but ultimately I'd expect to stabilize it as `-Cmetadata-link` (or a better name).

Make sure the resulting executable is 1) linkable, and 2) the same as built with `--emit metadata,link`.

jsgf · 2021-06-01T01:20:11Z

Because link is present, the compiler is requiring dependencies to be specified as rlib

Hm, that doesn't make sense. Something else is amiss.

jsgf · 2021-06-06T00:17:29Z

@bjorn3 OK, I've done some more experiments with -Zno-codegen. I found a couple of problems:

using --emit link=hollow/libfoo.rlib (in combination with -Zno-codegen to prevent the "real" rlib from being confused with the no-codegen one), causes the dep hash to change (--emit KIND=PATH affects crate hash vs --emit KIND #86044, PR to fix Fix emit path hashing #86045)
-Zno-codegen is tracked, so the rlib/rmeta it generates has a different dep hash and is not interoperable with the codegen version. I made it TRACKED_NO_CRATE_HASH which does make it work in little tests (I haven't tried anything non-trivial yet), but I don't know what the full implications of this are.

The fact that OutputTypes is part of the dep hash means that this approach of --emit metadata -Zmetadata-link isn't viable, unless we do some hackery around how the hashing is done.

Edit: I just verified that -Zno-codegen as untracked can build a non-trivial target.

varkor · 2021-06-06T12:12:08Z

@bjorn3: I'm going to reassign to you, because it looks like you're already familiar with the PR, but feel free to assign back to me if you're not happy reviewing.

r? @bjorn3

jsgf · 2021-06-07T18:50:14Z

I'm closing this in favour of using --emit link -Zno-codegen. I'll put up a PR to make -Zno-codegen TRACKED_BUT_NO_CRATE_HASH.

rust-highfive assigned varkor May 28, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 28, 2021

jsgf mentioned this pull request May 28, 2021

CTFE mir is sufficient for statics and consts #85430

Closed

This comment has been minimized.

Sign in to view

bjorn3 reviewed May 28, 2021

View reviewed changes

jsgf force-pushed the metadata-link branch from b47309c to 313928c Compare May 28, 2021 20:48

This was referenced May 28, 2021

Use command line metadata path if provided #85362

Merged

-Zalways-encode-mir=yes misses MIR for statics #85401

Closed

jsgf force-pushed the metadata-link branch from 313928c to b3ceab7 Compare May 29, 2021 20:06

jsgf force-pushed the metadata-link branch from b3ceab7 to 326751e Compare May 29, 2021 21:10

jsgf added 3 commits May 30, 2021 15:44

Add test for pipelined builds

3222439

This makes sure that building a crate with an rmeta dependency is viable as a linkable crate.

Add test for separate pipelined builds

83b4d4d

Make sure the resulting executable is 1) linkable, and 2) the same as built with `--emit metadata,link`.

jsgf force-pushed the metadata-link branch from 326751e to 83b4d4d Compare May 30, 2021 22:45

rust-highfive assigned bjorn3 and unassigned varkor Jun 6, 2021

jsgf closed this Jun 7, 2021

metadata-link: add -Zmetadata-link flag #85793

metadata-link: add -Zmetadata-link flag #85793

Uh oh!

Conversation

jsgf commented May 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented May 28, 2021

Uh oh!

jsgf commented May 28, 2021

Uh oh!

This comment has been minimized.

bjorn3 May 28, 2021

Choose a reason for hiding this comment

Uh oh!

ehuss commented May 28, 2021

Uh oh!

jsgf commented May 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjorn3 commented May 29, 2021

Uh oh!

jsgf commented May 29, 2021

Uh oh!

bjorn3 commented May 29, 2021

Uh oh!

jsgf commented May 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsgf commented May 29, 2021

Uh oh!

jsgf commented May 30, 2021

Uh oh!

RalfJung commented May 30, 2021

Uh oh!

jsgf commented Jun 1, 2021

Uh oh!

jsgf commented Jun 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varkor commented Jun 6, 2021

Uh oh!

jsgf commented Jun 7, 2021

Uh oh!

Uh oh!

jsgf commented May 28, 2021 •

edited

Loading

jsgf commented May 28, 2021 •

edited

Loading

jsgf commented May 29, 2021 •

edited

Loading

jsgf commented Jun 6, 2021 •

edited

Loading