-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Fix emit path hashing #86045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix emit path hashing #86045
Conversation
r? @jackh726 (rust-highfive has picked a reviewer for you, use r? to override) |
e3ac7ff
to
4701575
Compare
dep.hash, | ||
dep.host_hash, | ||
dep.kind | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems simple enough to me
4701575
to
06ca647
Compare
Switched back to using |
@jsgf: Can you elaborate on the motivation for this change? While it doesn't appear that any queries are depending on the path in Absent a strong reason (e.g. the option is changed frequently, and therefore impacts incremental compilation time), I think we should not be removing information from the dep-tracking hash. |
@Aaron1011 There's no reason to include the path in the hash. If you were to consider it important, then the The specific motivation is that I'm trying to use Fundamentally, dep tracking is trying to track the content of the emitted artifacts, and specifically, distinguish incompatible artifacts to prevent them from being used together. The actual pathname used to write the artifact to doesn't have any role in this. |
@jsgf: In addition to being used as input for the crate hash, the I agree that the crate hash shouldn't depend on the path passed to I'm sorry about the extra complexity here. However, we've recently uncovered a large number of ICEs due to improper tracking of global state during incremental compilation, and I'd like to avoid regressing our current handling. Feel free to message me on Zulip if you'd like to discuss the changes further. |
No, it only depends on which artifacts are requested, not where they are written. Even with incr comp the artifact destination is not assumed to already have the artifact when nothing has changed, so the artifacts are copied every time to the current artifact destination. |
@bjorn3: Nothing actually enforces that requirement - we've previously ran into issues where queries ended up depending on the value/order of something that "shouldn't" have mattered , but did ( I strongly believe that in order to avoid re-introducingrhe same type of bugs that caused us to disable incremental compilation recently, we need to remove/restrict untracked global state. If we aren't going to hash the filename, then we should somehow ensure that it can't be accessed via |
@Aaron1011 It's not clear to me whether you're pointing out a specific problem with this PR, or raising an issue of a general class of problems that overlap with this PR's concerns. From my POV, the intent is to make all the ways of naming output files functionally equivalent; it seems absurd to me that you'd get different bitwise output for |
Alternatively, we could make OutputTypes TRACKED_BUT_NO_CRATE_HASH, which I think addresses both my and @Aaron1011's concerns. It would also have the effect of making OutputType not affect the crate hashes at all, which I think is even better. After all, why should (By the same argument, |
|
@bjorn3 Hm, I'll look into that. But in general I've found |
Making this |
Let's r? @Aaron1011 since they're in a better position to review this than I am |
06ca647
to
2728ba0
Compare
@Aaron1011 @michaelwoerister I updated this to use @bjorn3 I also made |
This comment has been minimized.
This comment has been minimized.
2728ba0
to
2e9eadd
Compare
@@ -132,7 +132,7 @@ top_level_options!( | |||
lint_cap: Option<lint::Level> [TRACKED], | |||
force_warns: Vec<String> [TRACKED], | |||
describe_lints: bool [UNTRACKED], | |||
output_types: OutputTypes [TRACKED], | |||
output_types: OutputTypes [TRACKED_NO_CRATE_HASH], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about the fact that changing which output types are requested doesn't change the crate hash due to if output_types.should_codegen()
occurring in several locations that can affect the crate metadata like the list of reachable non-generics, the list of exported symbols and dependency_format::calculate_type
. In principle these should_codegen()
fast-paths could be removed, but it will likely regress cargo check
performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In https://internals.rust-lang.org/t/rfc-emit-check/14711/5?u=jsgf I measured it as ~8% difference on cargo check
for Cargo itself. I don't know how that relates to anything else.
I experimented with making a distinction between --emit metadata
and --emit check
and corresponding .rmeta
and .rcheck
files, but the addition of a new file type in the mix made things complex in location resolution in rustc and in Cargo itself. It would be much easier to just make .rmeta
invariant.
One advantage, in principle, is that if cargo check
's emitted metadata is "full fat", then it can be used immediately from cache in cargo build
which would effectively make the dependency graph flat and allow the rlibs to be generated completely in parallel - though I expect that would only really be interesting for builds which are closer to build from scratch vs incremental rebuild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder what the practical effect of this will be. I presume the risk is that some build system - like cargo
will intermix invocations of --emit metadata
(cargo check) and --emit link,metadata
(cargo build) and get failures if check metadata gets used for builds.
But what's the actual outcome? In the worst case, it could appear to succeed but generate subtly broken output. But I don't think that's the case here, is it? It will fail to find " list of reachable non-generics, the list of exported symbols and dependency_format::calculate_type", and fail as a result. Vs if the hashes are different, it will fail for that reason ("found possibly newer version of crate ...").
(If need be, could we extend the .rmeta format to include a type to explicitly flag "for check use only"?)
I haven't looked into how cargo
specifically managed check vs build .rmetas, but I think it avoids the problem by always regenerating .rmetas as part of a build. In my own experiments with Buck, I make sure to put them in separate dirs (which is the motivation for this PR, since currently doing that changes the crate hash independent of anything else).
Either way, any failure is a bug in the build system, not something that an end-user would expect to see with a correctly implemented build system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder what the practical effect of this will be. I presume the risk is that some build system - like cargo will intermix invocations of --emit metadata (cargo check) and --emit link,metadata (cargo build) and get failures if check metadata gets used for builds.
That is exactly what you want to do, right? Using the --emit metadata
output of dependencies to compile --emit link
crates.
But what's the actual outcome? In the worst case, it could appear to succeed but generate subtly broken output. But I don't think that's the case here, is it? It will fail to find " list of reachable non-generics, the list of exported symbols and dependency_format::calculate_type", and fail as a result. Vs if the hashes are different, it will fail for that reason ("found possibly newer version of crate ...").
A wrong result of dependency_format::calculate_type will cause linking to silently fail due to mixing up static and dynamic linking of crates. For example trying to statically link a crate that is already linked into a dynamic library dependency.
A wrong result for the list of reachable non-generics and exported symbols would likely result in the mono item collector creating mono items that are already included in dependencies (or maybe omitting necessary ones), causing duplicate linker errors.
I haven't looked into how cargo specifically managed check vs build .rmetas, but I think it avoids the problem by always regenerating .rmetas as part of a build.
It currently handles check and regular builds independently, but we want cargo check
to reuse cargo build
rmeta files in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, @bjorn3, would you prefer the previous variant where just the OutputTypes keys are hashed?
And what are your thoughts about making -Zno-codgen TRACKED_NO_CRATE_HASH?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current opinion is that using --emit metadata which always generates the same metadata regardless of other emits which is crate-hash compatible with --emit link would be ideal
I agree, but probably not everyone may agree with the associated perf loss unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the associated perf loss
Just a question, is it possible to reuse these fat rmeta generated in checking (seems it contains MIR and type information) when building binaries? If so, we can save 100% of check time (the time populating rmeta) during the build.
But sadly we currently just re-work on analysis and re-generate rmeta again when --emit link
. The check and build intermediate files are completely not cross-reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, build -> check reusing is already possible, but simply not implemented in cargo. There should be an issue open for that. check -> build reusing won't work even with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is check -> build reusing technically possible? I'm not sure if everything needed for codegen is already available in rmeta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it would be possible to never exclude anything from the rmeta even when no codegen is required. This would allow using the check rmeta for builds, but still require all crates to be compiled with codegen. The benefit of this would be more parallelization as codegen for all crates can immediately start without having to wait for the compilation of dependencies to get metadata. The downside is that the check metadata becomes larger which slows down check runs.
2e9eadd
to
245ff44
Compare
What's the next step for this? It's approved but still tagged "waiting on review". Does @Aaron1011 also need to approve? |
Yes, he is assigned to this PR and I would like someone else to look at it anyway just in case. |
245ff44
to
f5df9be
Compare
Rebased, and added a commit from @Aaron1011 which makes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with the extra test
4e27973
to
79e023e
Compare
Useful for debugging crate hash and resolution issues.
The PATH has no material effect on the emitted artifact, and setting the patch via `-o` or `--out-dir` does not affect the hash. Closes rust-lang#86044
This effectively turns OutputTypes into a hybrid where keys (OutputType) are TRACKED and the values (optional paths) are TRACKED_NO_CRATE_HASH.
79e023e
to
4514697
Compare
@bors r+ |
📌 Commit 4514697 has been approved by |
☀️ Test successful - checks-actions |
With
--emit KIND=PATH
, the PATH should not affect hashes used for dependency tracking. It does not with other ways of specifying output paths (-o
or--out-dir
).Also updates
rustc -Zls
to print more info about crates, which is used here to implement arun-make
test.It seems there was already a test explicitly checking that
OutputTypes
hash is affected by the path. I think this behaviour is wrong, so I updated the test.