Fix emit path hashing #86045

jsgf · 2021-06-05T23:31:13Z

With --emit KIND=PATH, the PATH should not affect hashes used for dependency tracking. It does not with other ways of specifying output paths (-o or --out-dir).

Also updates rustc -Zls to print more info about crates, which is used here to implement a run-make test.

It seems there was already a test explicitly checking that OutputTypes hash is affected by the path. I think this behaviour is wrong, so I updated the test.

rust-highfive · 2021-06-05T23:31:16Z

r? @jackh726

(rust-highfive has picked a reviewer for you, use r? to override)

bjorn3 · 2021-06-06T04:40:39Z

compiler/rustc_metadata/src/rmeta/decoder.rs

+                dep.hash,
+                dep.host_hash,
+                dep.kind
+            )?;


jackh726

Seems simple enough to me

compiler/rustc_session/src/config.rs

jsgf · 2021-06-06T19:32:25Z

Switched back to using impl_stable_hash_via_hash!(OutputTypes);

Aaron1011 · 2021-06-07T00:16:40Z

@jsgf: Can you elaborate on the motivation for this change? While it doesn't appear that any queries are depending on the path in OutputTypes, nothing prevents someone from adding such a dependency in the future.

Absent a strong reason (e.g. the option is changed frequently, and therefore impacts incremental compilation time), I think we should not be removing information from the dep-tracking hash.

jsgf · 2021-06-07T01:22:26Z

@Aaron1011 There's no reason to include the path in the hash. If you were to consider it important, then the -o and --out-dir options should also be tracked. You would also have to have a rationale for why rustc --emit link=path1 ... and rustc --emit link=path2 ...; mv path2 path1 should generate different artifacts.

The specific motivation is that I'm trying to use --emit link=path1 and --emit link=path2 -Zno-codegen to approximate --emit link,metadata - that is, pipelined builds but using two invocations of rustc. If the two different --emit link=path operations generate different hashes, then this doesn't work.

Fundamentally, dep tracking is trying to track the content of the emitted artifacts, and specifically, distinguish incompatible artifacts to prevent them from being used together. The actual pathname used to write the artifact to doesn't have any role in this.

Aaron1011 · 2021-06-07T02:04:07Z

@jsgf: In addition to being used as input for the crate hash, the DepTrackingHash is also used when determining whether or not to re-use the previous incremental compilation state.

I agree that the crate hash shouldn't depend on the path passed to --emit - however, the incremental hash should still depend on it. I think we'll need to modify the options! macro and/or the DepTrackingHash trait - we'll want to hash just the keys when computing the crate hash, but both the keys and values when computing the incremental hash (this is controlled by the for_crate_hash parameter).

I'm sorry about the extra complexity here. However, we've recently uncovered a large number of ICEs due to improper tracking of global state during incremental compilation, and I'd like to avoid regressing our current handling.

Feel free to message me on Zulip if you'd like to discuss the changes further.

bjorn3 · 2021-06-07T05:50:23Z

I agree that the crate hash shouldn't depend on the path passed to --emit - however, the incremental hash should still depend on it.

No, it only depends on which artifacts are requested, not where they are written. Even with incr comp the artifact destination is not assumed to already have the artifact when nothing has changed, so the artifacts are copied every time to the current artifact destination.

Aaron1011 · 2021-06-07T12:51:28Z

@bjorn3: Nothing actually enforces that requirement - we've previously ran into issues where queries ended up depending on the value/order of something that "shouldn't" have mattered , but did (-l) flags.

I strongly believe that in order to avoid re-introducingrhe same type of bugs that caused us to disable incremental compilation recently, we need to remove/restrict untracked global state. If we aren't going to hash the filename, then we should somehow ensure that it can't be accessed via tcx.

jsgf · 2021-06-07T17:18:12Z

@Aaron1011 It's not clear to me whether you're pointing out a specific problem with this PR, or raising an issue of a general class of problems that overlap with this PR's concerns.

From my POV, the intent is to make all the ways of naming output files functionally equivalent; it seems absurd to me that you'd get different bitwise output for --emit link -o libfoo.rlib and --emit link=libfoo.rlib, and that the resulting .rlib files are not interchangable. Do you have an objection to fixing this discrepency now, and then addressing your broader issues separately?

jsgf · 2021-06-07T17:40:18Z

Alternatively, we could make OutputTypes TRACKED_BUT_NO_CRATE_HASH, which I think addresses both my and @Aaron1011's concerns. It would also have the effect of making OutputType not affect the crate hashes at all, which I think is even better. After all, why should --emit link,dep-info and two invocations of --emit link and --emit dep-info generate different crate hashes? The dep-info is immaterial to the rlib. The only awkward thing here is that --emit link,metadata generates different .rmeta files from --emit metadata. But from a dependency tracking POV I think that's fine - it's strictly a superset, and the content that's common is (should be?) semantically identical.

(By the same argument, -Zno-codegen should also be TRACKED_BUT_NO_CRATE_HASH.)

bjorn3 · 2021-06-07T17:49:04Z

it's strictly a superset, and the content that's common is (should be?) semantically identical.

dependency_format::calculate_type returns an empty array for --emit metadata. AFAICT the result of this method depends on the result of this function encoded in the metadata of dylibs. I wouldn't be surprised if this leads to problems when mixing --emit metadata metadata with --emit link dylibs.

jsgf · 2021-06-07T18:22:02Z

@bjorn3 Hm, I'll look into that. But in general I've found --emit link -Zno-codegen a much more reliable way to emit metadata than --emit metadata.

michaelwoerister · 2021-06-08T13:26:50Z

Making this TRACKED_BUT_NO_CRATE_HASH might be a good idea -- but please be careful with changes here. Let me know if I should take a closer look.

jackh726 · 2021-06-08T14:11:22Z

Let's r? @Aaron1011 since they're in a better position to review this than I am

jsgf · 2021-06-10T00:15:34Z

@Aaron1011 @michaelwoerister I updated this to use TRACK_NO_CRATE_HASH for OutputTypes. I left the tests I added unchanged, and there's no regressions in those or ~~any other tests~~ I could see (some other failures, fixing).

@bjorn3 I also made -Zno-codegen TRACK_NO_CRATE_HASH since it shouldn't affect the crate content with respect to dependency tracking.

bjorn3 · 2021-06-10T08:15:02Z

compiler/rustc_session/src/options.rs

@@ -132,7 +132,7 @@ top_level_options!(
        lint_cap: Option<lint::Level> [TRACKED],
        force_warns: Vec<String> [TRACKED],
        describe_lints: bool [UNTRACKED],
-        output_types: OutputTypes [TRACKED],
+        output_types: OutputTypes [TRACKED_NO_CRATE_HASH],


I am concerned about the fact that changing which output types are requested doesn't change the crate hash due to if output_types.should_codegen() occurring in several locations that can affect the crate metadata like the list of reachable non-generics, the list of exported symbols and dependency_format::calculate_type. In principle these should_codegen() fast-paths could be removed, but it will likely regress cargo check performance.

In https://internals.rust-lang.org/t/rfc-emit-check/14711/5?u=jsgf I measured it as ~8% difference on cargo check for Cargo itself. I don't know how that relates to anything else.

I experimented with making a distinction between --emit metadata and --emit check and corresponding .rmeta and .rcheck files, but the addition of a new file type in the mix made things complex in location resolution in rustc and in Cargo itself. It would be much easier to just make .rmeta invariant.

One advantage, in principle, is that if cargo check's emitted metadata is "full fat", then it can be used immediately from cache in cargo build which would effectively make the dependency graph flat and allow the rlibs to be generated completely in parallel - though I expect that would only really be interesting for builds which are closer to build from scratch vs incremental rebuild.

I also wonder what the practical effect of this will be. I presume the risk is that some build system - like cargo will intermix invocations of --emit metadata (cargo check) and --emit link,metadata (cargo build) and get failures if check metadata gets used for builds.

But what's the actual outcome? In the worst case, it could appear to succeed but generate subtly broken output. But I don't think that's the case here, is it? It will fail to find " list of reachable non-generics, the list of exported symbols and dependency_format::calculate_type", and fail as a result. Vs if the hashes are different, it will fail for that reason ("found possibly newer version of crate ...").

(If need be, could we extend the .rmeta format to include a type to explicitly flag "for check use only"?)

I haven't looked into how cargo specifically managed check vs build .rmetas, but I think it avoids the problem by always regenerating .rmetas as part of a build. In my own experiments with Buck, I make sure to put them in separate dirs (which is the motivation for this PR, since currently doing that changes the crate hash independent of anything else).

Either way, any failure is a bug in the build system, not something that an end-user would expect to see with a correctly implemented build system.

I also wonder what the practical effect of this will be. I presume the risk is that some build system - like cargo will intermix invocations of --emit metadata (cargo check) and --emit link,metadata (cargo build) and get failures if check metadata gets used for builds.

That is exactly what you want to do, right? Using the --emit metadata output of dependencies to compile --emit link crates.

But what's the actual outcome? In the worst case, it could appear to succeed but generate subtly broken output. But I don't think that's the case here, is it? It will fail to find " list of reachable non-generics, the list of exported symbols and dependency_format::calculate_type", and fail as a result. Vs if the hashes are different, it will fail for that reason ("found possibly newer version of crate ...").

A wrong result of dependency_format::calculate_type will cause linking to silently fail due to mixing up static and dynamic linking of crates. For example trying to statically link a crate that is already linked into a dynamic library dependency.

A wrong result for the list of reachable non-generics and exported symbols would likely result in the mono item collector creating mono items that are already included in dependencies (or maybe omitting necessary ones), causing duplicate linker errors.

I haven't looked into how cargo specifically managed check vs build .rmetas, but I think it avoids the problem by always regenerating .rmetas as part of a build.

It currently handles check and regular builds independently, but we want cargo check to reuse cargo build rmeta files in the future.

Anyway, @bjorn3, would you prefer the previous variant where just the OutputTypes keys are hashed?

And what are your thoughts about making -Zno-codgen TRACKED_NO_CRATE_HASH?

My current opinion is that using --emit metadata which always generates the same metadata regardless of other emits which is crate-hash compatible with --emit link would be ideal

I agree, but probably not everyone may agree with the associated perf loss unfortunately.

the associated perf loss

Just a question, is it possible to reuse these fat rmeta generated in checking (seems it contains MIR and type information) when building binaries? If so, we can save 100% of check time (the time populating rmeta) during the build.
But sadly we currently just re-work on analysis and re-generate rmeta again when --emit link. The check and build intermediate files are completely not cross-reused.

Yes, build -> check reusing is already possible, but simply not implemented in cargo. There should be an issue open for that. check -> build reusing won't work even with this PR.

Is check -> build reusing technically possible? I'm not sure if everything needed for codegen is already available in rmeta.

In theory it would be possible to never exclude anything from the rmeta even when no codegen is required. This would allow using the check rmeta for builds, but still require all crates to be compiled with codegen. The benefit of this would be more parallelization as codegen for all crates can immediately start without having to wait for the compilation of dependencies to get metadata. The downside is that the check metadata becomes larger which slows down check runs.

jsgf · 2021-06-11T19:30:32Z

What's the next step for this? It's approved but still tagged "waiting on review". Does @Aaron1011 also need to approve?

bjorn3 · 2021-06-11T19:37:21Z

Yes, he is assigned to this PR and I would like someone else to look at it anyway just in case.

jsgf · 2021-06-20T21:57:31Z

Rebased, and added a commit from @Aaron1011 which makes OutputTypes a TRACKED/TRACKED_NO_CRATE_HASH hybrid (keys are tracked, paths are no crate hash). @bjorn3 could you review this; in conversation @Aaron1011 said he's fine with this PR with this addition but wanted another reviewer for it.

bjorn3

r=me with the extra test

src/test/run-make/emit-path-unhashed/Makefile

Useful for debugging crate hash and resolution issues.

The PATH has no material effect on the emitted artifact, and setting the patch via `-o` or `--out-dir` does not affect the hash. Closes rust-lang#86044

This effectively turns OutputTypes into a hybrid where keys (OutputType) are TRACKED and the values (optional paths) are TRACKED_NO_CRATE_HASH.

bjorn3 · 2021-06-22T05:20:16Z

@bors r+

bors · 2021-06-22T05:20:18Z

📌 Commit 4514697 has been approved by bjorn3

bors · 2021-06-22T17:35:00Z

⌛ Testing commit 4514697 with merge b8be316...

bors · 2021-06-22T20:15:50Z

☀️ Test successful - checks-actions
Approved by: bjorn3
Pushing b8be316 to master...

rust-highfive assigned jackh726 Jun 5, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 5, 2021

jsgf force-pushed the fix-emit-path-hashing branch from e3ac7ff to 4701575 Compare June 5, 2021 23:48

jsgf mentioned this pull request Jun 6, 2021

metadata-link: add -Zmetadata-link flag #85793

Closed

bjorn3 reviewed Jun 6, 2021

View reviewed changes

compiler/rustc_metadata/src/rmeta/decoder.rs

dep.hash,

dep.host_hash,

dep.kind

)?;

Copy link

Member

bjorn3 Jun 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

jackh726 approved these changes Jun 6, 2021

View reviewed changes

compiler/rustc_session/src/config.rs Show resolved Hide resolved

jsgf force-pushed the fix-emit-path-hashing branch from 4701575 to 06ca647 Compare June 6, 2021 19:32

rust-highfive assigned Aaron1011 and unassigned jackh726 Jun 8, 2021

jsgf force-pushed the fix-emit-path-hashing branch from 06ca647 to 2728ba0 Compare June 10, 2021 00:12

This comment has been minimized.

Sign in to view

jsgf force-pushed the fix-emit-path-hashing branch from 2728ba0 to 2e9eadd Compare June 10, 2021 03:28

bjorn3 requested changes Jun 10, 2021

View reviewed changes

jsgf force-pushed the fix-emit-path-hashing branch from 2e9eadd to 245ff44 Compare June 10, 2021 21:03

bjorn3 approved these changes Jun 10, 2021

View reviewed changes

jsgf force-pushed the fix-emit-path-hashing branch from 245ff44 to f5df9be Compare June 20, 2021 21:55

bjorn3 reviewed Jun 21, 2021

View reviewed changes

src/test/run-make/emit-path-unhashed/Makefile Show resolved Hide resolved

jsgf force-pushed the fix-emit-path-hashing branch from 4e27973 to 79e023e Compare June 22, 2021 00:22

jsgf and others added 6 commits June 21, 2021 17:22

Print more crate details in -Zls

cef3ab7

Useful for debugging crate hash and resolution issues.

In --emit KIND=PATH options, only hash KIND

a26d99f

The PATH has no material effect on the emitted artifact, and setting the patch via `-o` or `--out-dir` does not affect the hash. Closes rust-lang#86044

Implement assert_non_crate_hash_different for tests

48921ce

make -Zno-codegen TRACKED_NO_CRATE_HASH

f1f7f2f

Only hash OutputTypes keys in non-crate-hash mode

99f652f

This effectively turns OutputTypes into a hybrid where keys (OutputType) are TRACKED and the values (optional paths) are TRACKED_NO_CRATE_HASH.

Add test showing different KIND parameters change hash

4514697

jsgf force-pushed the fix-emit-path-hashing branch from 79e023e to 4514697 Compare June 22, 2021 00:24

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 22, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 22, 2021

bors merged commit b8be316 into rust-lang:master Jun 22, 2021

rustbot added this to the 1.55.0 milestone Jun 22, 2021

jsgf deleted the fix-emit-path-hashing branch June 22, 2021 21:48

jsgf mentioned this pull request Jul 14, 2021

--cap-lints and related affect crate hash #87144

Closed

montekki mentioned this pull request Nov 10, 2021

Updates rustc -Z ls output format paritytech/cachepot#115

Merged

Fix emit path hashing #86045

Fix emit path hashing #86045

Uh oh!

Conversation

jsgf commented Jun 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Jun 5, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackh726 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jsgf commented Jun 6, 2021

Uh oh!

Aaron1011 commented Jun 7, 2021

Uh oh!

jsgf commented Jun 7, 2021

Uh oh!

Aaron1011 commented Jun 7, 2021

Uh oh!

bjorn3 commented Jun 7, 2021

Uh oh!

Aaron1011 commented Jun 7, 2021

Uh oh!

jsgf commented Jun 7, 2021

Uh oh!

jsgf commented Jun 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjorn3 commented Jun 7, 2021

Uh oh!

jsgf commented Jun 7, 2021

Uh oh!

michaelwoerister commented Jun 8, 2021

Uh oh!

jackh726 commented Jun 8, 2021

Uh oh!

jsgf commented Jun 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oxalica Jun 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsgf commented Jun 11, 2021

Uh oh!

bjorn3 commented Jun 11, 2021

Uh oh!

jsgf commented Jun 20, 2021

Uh oh!

bjorn3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bjorn3 commented Jun 22, 2021

Uh oh!

bors commented Jun 22, 2021

Uh oh!

jsgf commented Jun 5, 2021 •

edited

Loading

jsgf commented Jun 7, 2021 •

edited

Loading

jsgf commented Jun 10, 2021 •

edited

Loading

oxalica Jun 21, 2021 •

edited

Loading