Skip to content

metadata-link: add -Zmetadata-link flag #85793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

jsgf
Copy link
Contributor

@jsgf jsgf commented May 28, 2021

This adds the -Zmetadata-link flag, whose intent is to make the output
of --emit metadata equivalent to --emit metadata,link. The goal is
to allow pipelined builds with separate invocations of rustc. This is
desireable for two reasons:

  1. The "artifact notification" system that rustc and cargo use to
    communicate is very cargo-specific - it doesn't fit well into other
    builds systems such as Buck or Bazel. In general its incompatible with
    any build system which only recognizes output artifacts once the
    compiler process has terminated. This is a particular problem when
    compilation is distributed.

  2. The rmeta file could be cached independently from the rlib. For
    example, if one generates compilation-ready rmeta files as part
    of "check" build, then those can be directly used for a full
    compilation from cache, without having to regenerate them. This
    means the build turns into a highly parallelizable invocation of
    rustc --emit link commands using cached rmeta files as input.

Initially this flag -Zmetadata-link, but ultimately I'd expect to
stabilize it as -Cmetadata-link (or a better name).

There's some outstanding problems/questions:

  • No tests. It's not clear to me that there are existing tests which cover the pipelined
    build mode specifically; src/tests/ui/rmeta somewhat touches on it.
    Added tests for
    plain pipelined builds and with -Zmetadata-link.
  • The generated rmetas from --emit metadata -Zmetadata-link are much larger
    than those from --emit metadata,link and I don't know why. While they seem to work
    in local ad-hoc testing, I'm concerned they may have different performance/compilation time/something else
    problems.
    I'm not sure what I was seeing before, but this looks fine - they're within a couple of bytes of each other.
  • This is very similar to -Zalways-encode-mir but I think they're logically distinct (eg I see
    this new flag as being on stabilization track).
    • @bjorn3 suggests this might be equivalent to --emit metadata,link -Zno-codegen which
      sounds plausible but quite roundabout.
  • Another approach might be to make --emit metadata always work this way, and maybe
    add --emit check which emits a subset of rmeta for check builds. This would either mean
    that there could be a mixture of different kinds of rmeta files which are usable for different uses
    (ie, the confusion of two different --emit options producing files with the same name but different
    content. Alternatively --emit check could generate (say) .rcheck files, but that adds complexity
    to locating files, since we'd end up with a 3 level heirarchy of rlib > rmeta > rcheck in terms of
    what the files are useful for. I experimented with this briefly, but its effects ended up being overwhelmingly
    sprawling and complex (eg affecting Cargo as well).
    -Zmetadata-link is much more narrowly scoped and opt-in.

@rust-highfive
Copy link
Contributor

r? @varkor

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 28, 2021
@jsgf
Copy link
Contributor Author

jsgf commented May 28, 2021

cc @oli-obk because @RalfJung mentioned you've got context for this.

@rust-log-analyzer

This comment has been minimized.

@@ -1037,7 +1037,8 @@ fn start_executing_work<B: ExtraBackendMethods>(
}));

let ol = if tcx.sess.opts.debugging_opts.no_codegen
|| !tcx.sess.opts.output_types.should_codegen()
|| (!tcx.sess.opts.output_types.should_codegen()
&& !tcx.sess.opts.debugging_opts.metadata_link)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. This is only relevant for the actual codegen, not for encoding metadata.

@ehuss
Copy link
Contributor

ehuss commented May 28, 2021

Can you clarify how this would work? Like how would be the sequence of rustc calls that a build system would use? It's not clear to me how it would run rustc -Zmetadata-link to generate the rmeta, and then run rustc again to generate the rlib. Wouldn't the second call to rustc redo all the work of the first one? Or are you considering that an acceptable cost? Or are you relying on incremental?

@jsgf
Copy link
Contributor Author

jsgf commented May 28, 2021

@ehuss It's exactly analogous to pipelined builds in Cargo, except that rustc is invoked on each crate twice (at least those crates which have downstream non-linking dependents). The front-end parsing/typecheck/etc work is duplicated, but the (presumably more expensive) llvm codegen is not.

The goal is the shorten the critical path for the dependencies by using metadata-only builds, so only the final linking target (ie executable) is dependent on codegen in order to increase build parallelism. In this scenario we can assume there's unlimited computing resources, so we're trying to remove as coupling in the dependency graph as possible to unlock parallelism and reduce the overall build latency.

So if you have the simple case of main depends on bar depends on foo, without pipelining, we'd need to wait for libfoo.rlib to compile before starting on libbar.rlib, and then main. But with this PR we can generate libfoo.rmeta with low latency, and in parallel generate libfoo.rlib and libbar.rlib (which only depends on libfoo.rmeta), before going on to main.

A: rustc --emit metadata -Zmetadata-link --crate-type rlib foo.rs & # generate libfoo.rmeta
B: rustc --emit metadata -Zmetadata-link --crate-type rlib bar.rs --extern foo=libfoo.rmeta & # generate libbar.rmeta for dependencies (not needed here)
C: wait A; rustc --emit link --crate-type rlib bar.rs --extern foo=libfoo.rmeta & # generate libbar.rlib for linking
D: rustc --emit link --crate-type rlib foo.rs & # libfoo.rlib (could be started at the same time as A)
E: wait C D; rustc --emit link --crate-type bin main.rs --extern bar=libbar.rlib -Ldependency=. # final executable

If we already have cached artifacts for any of these steps - say from a previous check build (assuming check uses -Zmetadata-link for this purpose) - then we can just skip the step and use the artifact from cache. (And conversely a check build can use metadata generated by one of these build steps.)

(Ideally we could decouple the compilation of main from linking, but that's a whole other conversation.)

I have a prototype of this in a build system, but rustc doesn't currently support it without this PR. I originally tried using this with -Zalways-encode-mir but that's not equivalent to --emit metadata,link and it didn't work (#85401).

I'm curious to see how well this works, especially with the decoupled caching for rlib and rmeta.

@bjorn3
Copy link
Member

bjorn3 commented May 29, 2021

(Ideally we could decouple the compilation of main from linking, but that's a whole other conversation.)

That would be -Zno-link/-Zlink-only.

@jsgf
Copy link
Contributor Author

jsgf commented May 29, 2021

(Ideally we could decouple the compilation of main from linking, but that's a whole other conversation.)

That would be -Zno-link/-Zlink-only.

Yeah, as I said, a whole other conversation. Unless they've changed since I last looked at them, they're not (quite) suitable for what I have in mind.

@jsgf jsgf force-pushed the metadata-link branch from 313928c to b3ceab7 Compare May 29, 2021 20:06
@bjorn3
Copy link
Member

bjorn3 commented May 29, 2021

Wait a minute. Isn't this supposed to be equivalent to whaf --emit metadata,link in combination with -Zno-codegen is?

@jsgf jsgf force-pushed the metadata-link branch from b3ceab7 to 326751e Compare May 29, 2021 21:10
@jsgf
Copy link
Contributor Author

jsgf commented May 29, 2021

@bjorn3 I'll check it out but it sounds like a roundabout way of getting the desired results.

Edit: Hm, --emit metadata,link -Zno-codegen still emits a .rlib file which only contains the .rmeta. That seems like reasonable behaviour given what's requested of the compiler. It gets in the way and can be confused with the "real" r.lib, but I guess it can be discarded with --emit metadata,link=/dev/null or something.

Edit2: Looking at -Ztime-passes, using -Zno-codegen and -Zmetadata-link are more or less the same, except the former still has the link_rlib pass. It looks like that's due to:

@jsgf
Copy link
Contributor Author

jsgf commented May 29, 2021

Tests added.

@jsgf
Copy link
Contributor Author

jsgf commented May 30, 2021

@bjorn3 Actually after further testing, I don't think -Zno-codegen does the job. Because link is present, the compiler is requiring dependencies to be specified as rlib (ie "crate foo required to be available in rlib format, but was not found in this form"), which undermines the whole point - the goal is to make the all the dependencies rmeta rather than rlib.

Given -Zno-codegen's (not quite accurate) description "run all passes except codegen; no output", it doesn't seem like it should change how location resolution works.

@RalfJung
Copy link
Member

Cc @hyd-dev who also looked into issues around should_codegen() for Miri recently

jsgf added 3 commits May 30, 2021 15:44
This makes sure that building a crate with an rmeta dependency is viable as a
linkable crate.
This adds the -Zmetadata-link flag, whose intent is to make the output
of `--emit metadata` equivalent to `--emit metadata,link`. The goal is
to allow pipelined builds with separate invocations of rustc. This is
desireable for two reasons:

1. The "artifact notification" system that rustc and cargo use to
  communicate is very cargo-specific - it doesn't fit well into other
  builds systems such as Buck or Bazel. In general its incompatible with
  any build system which only recognizes output artifacts once the
  compiler process has terminated. This is a particular problem when
  compilation is distributed.

2. The rmeta file could be cached independently from the rlib. For
  example, if one generates compilation-ready rmeta files as part
  of "check" build, then those can be directly used for a full
  compilation from cache, without having to regenerate them. This
  means the build turns into a highly parallelizable invocation of
  `rustc --emit link` commands using cached `rmeta` files as input.

Initially this flag `-Zmetadata-link`, but ultimately I'd expect to
stabilize it as `-Cmetadata-link` (or a better name).
Make sure the resulting executable is 1) linkable, and 2) the same
as built with `--emit metadata,link`.
@jsgf jsgf force-pushed the metadata-link branch from 326751e to 83b4d4d Compare May 30, 2021 22:45
@jsgf
Copy link
Contributor Author

jsgf commented Jun 1, 2021

Because link is present, the compiler is requiring dependencies to be specified as rlib

Hm, that doesn't make sense. Something else is amiss.

@jsgf
Copy link
Contributor Author

jsgf commented Jun 6, 2021

@bjorn3 OK, I've done some more experiments with -Zno-codegen. I found a couple of problems:

  • using --emit link=hollow/libfoo.rlib (in combination with -Zno-codegen to prevent the "real" rlib from being confused with the no-codegen one), causes the dep hash to change (--emit KIND=PATH affects crate hash vs --emit KIND #86044, PR to fix Fix emit path hashing #86045)
  • -Zno-codegen is tracked, so the rlib/rmeta it generates has a different dep hash and is not interoperable with the codegen version. I made it TRACKED_NO_CRATE_HASH which does make it work in little tests (I haven't tried anything non-trivial yet), but I don't know what the full implications of this are.

The fact that OutputTypes is part of the dep hash means that this approach of --emit metadata -Zmetadata-link isn't viable, unless we do some hackery around how the hashing is done.

Edit: I just verified that -Zno-codegen as untracked can build a non-trivial target.

@varkor
Copy link
Member

varkor commented Jun 6, 2021

@bjorn3: I'm going to reassign to you, because it looks like you're already familiar with the PR, but feel free to assign back to me if you're not happy reviewing.

r? @bjorn3

@rust-highfive rust-highfive assigned bjorn3 and unassigned varkor Jun 6, 2021
@jsgf
Copy link
Contributor Author

jsgf commented Jun 7, 2021

I'm closing this in favour of using --emit link -Zno-codegen. I'll put up a PR to make -Zno-codegen TRACKED_BUT_NO_CRATE_HASH.

@jsgf jsgf closed this Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants