Skip to content

Fix emit path hashing #86045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 22, 2021
Merged

Fix emit path hashing #86045

merged 6 commits into from
Jun 22, 2021

Conversation

jsgf
Copy link
Contributor

@jsgf jsgf commented Jun 5, 2021

With --emit KIND=PATH, the PATH should not affect hashes used for dependency tracking. It does not with other ways of specifying output paths (-o or --out-dir).

Also updates rustc -Zls to print more info about crates, which is used here to implement a run-make test.

It seems there was already a test explicitly checking that OutputTypes hash is affected by the path. I think this behaviour is wrong, so I updated the test.

@rust-highfive
Copy link
Contributor

r? @jackh726

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 5, 2021
@jsgf jsgf force-pushed the fix-emit-path-hashing branch from e3ac7ff to 4701575 Compare June 5, 2021 23:48
dep.hash,
dep.host_hash,
dep.kind
)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

@jackh726 jackh726 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems simple enough to me

@jsgf jsgf force-pushed the fix-emit-path-hashing branch from 4701575 to 06ca647 Compare June 6, 2021 19:32
@jsgf
Copy link
Contributor Author

jsgf commented Jun 6, 2021

Switched back to using impl_stable_hash_via_hash!(OutputTypes);

@Aaron1011
Copy link
Member

@jsgf: Can you elaborate on the motivation for this change? While it doesn't appear that any queries are depending on the path in OutputTypes, nothing prevents someone from adding such a dependency in the future.

Absent a strong reason (e.g. the option is changed frequently, and therefore impacts incremental compilation time), I think we should not be removing information from the dep-tracking hash.

@jsgf
Copy link
Contributor Author

jsgf commented Jun 7, 2021

@Aaron1011 There's no reason to include the path in the hash. If you were to consider it important, then the -o and --out-dir options should also be tracked. You would also have to have a rationale for why rustc --emit link=path1 ... and rustc --emit link=path2 ...; mv path2 path1 should generate different artifacts.

The specific motivation is that I'm trying to use --emit link=path1 and --emit link=path2 -Zno-codegen to approximate --emit link,metadata - that is, pipelined builds but using two invocations of rustc. If the two different --emit link=path operations generate different hashes, then this doesn't work.

Fundamentally, dep tracking is trying to track the content of the emitted artifacts, and specifically, distinguish incompatible artifacts to prevent them from being used together. The actual pathname used to write the artifact to doesn't have any role in this.

@Aaron1011
Copy link
Member

@jsgf: In addition to being used as input for the crate hash, the DepTrackingHash is also used when determining whether or not to re-use the previous incremental compilation state.

I agree that the crate hash shouldn't depend on the path passed to --emit - however, the incremental hash should still depend on it. I think we'll need to modify the options! macro and/or the DepTrackingHash trait - we'll want to hash just the keys when computing the crate hash, but both the keys and values when computing the incremental hash (this is controlled by the for_crate_hash parameter).

I'm sorry about the extra complexity here. However, we've recently uncovered a large number of ICEs due to improper tracking of global state during incremental compilation, and I'd like to avoid regressing our current handling.

Feel free to message me on Zulip if you'd like to discuss the changes further.

@bjorn3
Copy link
Member

bjorn3 commented Jun 7, 2021

I agree that the crate hash shouldn't depend on the path passed to --emit - however, the incremental hash should still depend on it.

No, it only depends on which artifacts are requested, not where they are written. Even with incr comp the artifact destination is not assumed to already have the artifact when nothing has changed, so the artifacts are copied every time to the current artifact destination.

@Aaron1011
Copy link
Member

@bjorn3: Nothing actually enforces that requirement - we've previously ran into issues where queries ended up depending on the value/order of something that "shouldn't" have mattered , but did (-l) flags.

I strongly believe that in order to avoid re-introducingrhe same type of bugs that caused us to disable incremental compilation recently, we need to remove/restrict untracked global state. If we aren't going to hash the filename, then we should somehow ensure that it can't be accessed via tcx.

@jsgf
Copy link
Contributor Author

jsgf commented Jun 7, 2021

@Aaron1011 It's not clear to me whether you're pointing out a specific problem with this PR, or raising an issue of a general class of problems that overlap with this PR's concerns.

From my POV, the intent is to make all the ways of naming output files functionally equivalent; it seems absurd to me that you'd get different bitwise output for --emit link -o libfoo.rlib and --emit link=libfoo.rlib, and that the resulting .rlib files are not interchangable. Do you have an objection to fixing this discrepency now, and then addressing your broader issues separately?

@jsgf
Copy link
Contributor Author

jsgf commented Jun 7, 2021

Alternatively, we could make OutputTypes TRACKED_BUT_NO_CRATE_HASH, which I think addresses both my and @Aaron1011's concerns. It would also have the effect of making OutputType not affect the crate hashes at all, which I think is even better. After all, why should --emit link,dep-info and two invocations of --emit link and --emit dep-info generate different crate hashes? The dep-info is immaterial to the rlib. The only awkward thing here is that --emit link,metadata generates different .rmeta files from --emit metadata. But from a dependency tracking POV I think that's fine - it's strictly a superset, and the content that's common is (should be?) semantically identical.

(By the same argument, -Zno-codegen should also be TRACKED_BUT_NO_CRATE_HASH.)

@bjorn3
Copy link
Member

bjorn3 commented Jun 7, 2021

it's strictly a superset, and the content that's common is (should be?) semantically identical.

dependency_format::calculate_type returns an empty array for --emit metadata. AFAICT the result of this method depends on the result of this function encoded in the metadata of dylibs. I wouldn't be surprised if this leads to problems when mixing --emit metadata metadata with --emit link dylibs.

@jsgf
Copy link
Contributor Author

jsgf commented Jun 7, 2021

@bjorn3 Hm, I'll look into that. But in general I've found --emit link -Zno-codegen a much more reliable way to emit metadata than --emit metadata.

@michaelwoerister
Copy link
Member

Making this TRACKED_BUT_NO_CRATE_HASH might be a good idea -- but please be careful with changes here. Let me know if I should take a closer look.

@jackh726
Copy link
Member

jackh726 commented Jun 8, 2021

Let's r? @Aaron1011 since they're in a better position to review this than I am

@rust-highfive rust-highfive assigned Aaron1011 and unassigned jackh726 Jun 8, 2021
@jsgf jsgf force-pushed the fix-emit-path-hashing branch from 06ca647 to 2728ba0 Compare June 10, 2021 00:12
@jsgf
Copy link
Contributor Author

jsgf commented Jun 10, 2021

@Aaron1011 @michaelwoerister I updated this to use TRACK_NO_CRATE_HASH for OutputTypes. I left the tests I added unchanged, and there's no regressions in those or any other tests I could see (some other failures, fixing).

@bjorn3 I also made -Zno-codegen TRACK_NO_CRATE_HASH since it shouldn't affect the crate content with respect to dependency tracking.

@rust-log-analyzer

This comment has been minimized.

@jsgf jsgf force-pushed the fix-emit-path-hashing branch from 2728ba0 to 2e9eadd Compare June 10, 2021 03:28
@@ -132,7 +132,7 @@ top_level_options!(
lint_cap: Option<lint::Level> [TRACKED],
force_warns: Vec<String> [TRACKED],
describe_lints: bool [UNTRACKED],
output_types: OutputTypes [TRACKED],
output_types: OutputTypes [TRACKED_NO_CRATE_HASH],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the fact that changing which output types are requested doesn't change the crate hash due to if output_types.should_codegen() occurring in several locations that can affect the crate metadata like the list of reachable non-generics, the list of exported symbols and dependency_format::calculate_type. In principle these should_codegen() fast-paths could be removed, but it will likely regress cargo check performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://internals.rust-lang.org/t/rfc-emit-check/14711/5?u=jsgf I measured it as ~8% difference on cargo check for Cargo itself. I don't know how that relates to anything else.

I experimented with making a distinction between --emit metadata and --emit check and corresponding .rmeta and .rcheck files, but the addition of a new file type in the mix made things complex in location resolution in rustc and in Cargo itself. It would be much easier to just make .rmeta invariant.

One advantage, in principle, is that if cargo check's emitted metadata is "full fat", then it can be used immediately from cache in cargo build which would effectively make the dependency graph flat and allow the rlibs to be generated completely in parallel - though I expect that would only really be interesting for builds which are closer to build from scratch vs incremental rebuild.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder what the practical effect of this will be. I presume the risk is that some build system - like cargo will intermix invocations of --emit metadata (cargo check) and --emit link,metadata (cargo build) and get failures if check metadata gets used for builds.

But what's the actual outcome? In the worst case, it could appear to succeed but generate subtly broken output. But I don't think that's the case here, is it? It will fail to find " list of reachable non-generics, the list of exported symbols and dependency_format::calculate_type", and fail as a result. Vs if the hashes are different, it will fail for that reason ("found possibly newer version of crate ...").

(If need be, could we extend the .rmeta format to include a type to explicitly flag "for check use only"?)

I haven't looked into how cargo specifically managed check vs build .rmetas, but I think it avoids the problem by always regenerating .rmetas as part of a build. In my own experiments with Buck, I make sure to put them in separate dirs (which is the motivation for this PR, since currently doing that changes the crate hash independent of anything else).

Either way, any failure is a bug in the build system, not something that an end-user would expect to see with a correctly implemented build system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder what the practical effect of this will be. I presume the risk is that some build system - like cargo will intermix invocations of --emit metadata (cargo check) and --emit link,metadata (cargo build) and get failures if check metadata gets used for builds.

That is exactly what you want to do, right? Using the --emit metadata output of dependencies to compile --emit link crates.

But what's the actual outcome? In the worst case, it could appear to succeed but generate subtly broken output. But I don't think that's the case here, is it? It will fail to find " list of reachable non-generics, the list of exported symbols and dependency_format::calculate_type", and fail as a result. Vs if the hashes are different, it will fail for that reason ("found possibly newer version of crate ...").

A wrong result of dependency_format::calculate_type will cause linking to silently fail due to mixing up static and dynamic linking of crates. For example trying to statically link a crate that is already linked into a dynamic library dependency.

A wrong result for the list of reachable non-generics and exported symbols would likely result in the mono item collector creating mono items that are already included in dependencies (or maybe omitting necessary ones), causing duplicate linker errors.

I haven't looked into how cargo specifically managed check vs build .rmetas, but I think it avoids the problem by always regenerating .rmetas as part of a build.

It currently handles check and regular builds independently, but we want cargo check to reuse cargo build rmeta files in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, @bjorn3, would you prefer the previous variant where just the OutputTypes keys are hashed?

And what are your thoughts about making -Zno-codgen TRACKED_NO_CRATE_HASH?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current opinion is that using --emit metadata which always generates the same metadata regardless of other emits which is crate-hash compatible with --emit link would be ideal

I agree, but probably not everyone may agree with the associated perf loss unfortunately.

Copy link
Contributor

@oxalica oxalica Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the associated perf loss

Just a question, is it possible to reuse these fat rmeta generated in checking (seems it contains MIR and type information) when building binaries? If so, we can save 100% of check time (the time populating rmeta) during the build.
But sadly we currently just re-work on analysis and re-generate rmeta again when --emit link. The check and build intermediate files are completely not cross-reused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, build -> check reusing is already possible, but simply not implemented in cargo. There should be an issue open for that. check -> build reusing won't work even with this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is check -> build reusing technically possible? I'm not sure if everything needed for codegen is already available in rmeta.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it would be possible to never exclude anything from the rmeta even when no codegen is required. This would allow using the check rmeta for builds, but still require all crates to be compiled with codegen. The benefit of this would be more parallelization as codegen for all crates can immediately start without having to wait for the compilation of dependencies to get metadata. The downside is that the check metadata becomes larger which slows down check runs.

@jsgf jsgf force-pushed the fix-emit-path-hashing branch from 2e9eadd to 245ff44 Compare June 10, 2021 21:03
@jsgf
Copy link
Contributor Author

jsgf commented Jun 11, 2021

What's the next step for this? It's approved but still tagged "waiting on review". Does @Aaron1011 also need to approve?

@bjorn3
Copy link
Member

bjorn3 commented Jun 11, 2021

Yes, he is assigned to this PR and I would like someone else to look at it anyway just in case.

@jsgf jsgf force-pushed the fix-emit-path-hashing branch from 245ff44 to f5df9be Compare June 20, 2021 21:55
@jsgf
Copy link
Contributor Author

jsgf commented Jun 20, 2021

Rebased, and added a commit from @Aaron1011 which makes OutputTypes a TRACKED/TRACKED_NO_CRATE_HASH hybrid (keys are tracked, paths are no crate hash). @bjorn3 could you review this; in conversation @Aaron1011 said he's fine with this PR with this addition but wanted another reviewer for it.

Copy link
Member

@bjorn3 bjorn3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with the extra test

@jsgf jsgf force-pushed the fix-emit-path-hashing branch from 4e27973 to 79e023e Compare June 22, 2021 00:22
jsgf and others added 6 commits June 21, 2021 17:22
Useful for debugging crate hash and resolution issues.
The PATH has no material effect on the emitted artifact, and setting
the patch via `-o` or `--out-dir` does not affect the hash.

Closes rust-lang#86044
This effectively turns OutputTypes into a hybrid where keys (OutputType)
are TRACKED and the values (optional paths) are TRACKED_NO_CRATE_HASH.
@jsgf jsgf force-pushed the fix-emit-path-hashing branch from 79e023e to 4514697 Compare June 22, 2021 00:24
@bjorn3
Copy link
Member

bjorn3 commented Jun 22, 2021

@bors r+

@bors
Copy link
Collaborator

bors commented Jun 22, 2021

📌 Commit 4514697 has been approved by bjorn3

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 22, 2021
@bors
Copy link
Collaborator

bors commented Jun 22, 2021

⌛ Testing commit 4514697 with merge b8be316...

@bors
Copy link
Collaborator

bors commented Jun 22, 2021

☀️ Test successful - checks-actions
Approved by: bjorn3
Pushing b8be316 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 22, 2021
@bors bors merged commit b8be316 into rust-lang:master Jun 22, 2021
@rustbot rustbot added this to the 1.55.0 milestone Jun 22, 2021
@jsgf jsgf deleted the fix-emit-path-hashing branch June 22, 2021 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants