Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only use the new DepNode hashmap for anonymous nodes. #109050

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

cjgillot
Copy link
Contributor

Profiling the compilation of the windows crate with massif showed that a hashmap DepNode -> DepNodeIndex accounted for 6Gb in the peak memory usage for the windows crate.

This PR aims to remove this map.

There are 3 hashmaps of DepNode in the query system:

  1. the index from the serialized graph;
  2. the fingerprints map which is used for debugging.
  3. the new_node_to_index map which is used to deduplicate nodes and check for their existence.

Hashmap 1 is unavoidable, as this is the only way to make a correspondence between two compilation sessions.

The second commit replaces hashmap 2 by a simple IndexVec<DepNodeIndex, Option<Fingerprint>>.

Hashmap 2 is the more interesting. Having duplicate DepNodes in the serialized graph is not supported. So the current solution was to either ICE when creating duplicates, or deduplicate them silently using the new_node_to_index map.
The third commit moves the burden of checking for duplicates to dep-graph deserialization, which fails when there are duplicates. Instead of an ICE, we silently clear the incremental session, and continue compilation.
The remaining source of duplicates if the creation of anonymous nodes, for which the DepNode is just the hash of the dependencies' indices. The new_node_to_index map is shrunk to only be used for such anonymous nodes.

Those changes allow to go from 21 Gb to 16 Gb peak memory usage on that crate.

Drive-by: the first commits fixes #101518 by marking the affected query as anonymous. As we remove the check for duplicated DepNodes, the ICE would have been replaced by a silent clearing of the incremental state, which would have been unfortunate.

https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/improving.20rustc.20memory.20usage

Fixes #83085
Fixes #101518
Fixes #106136
Fixes #107991
Fixes #108657

@rustbot
Copy link
Collaborator

rustbot commented Mar 12, 2023

r? @oli-obk

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 12, 2023
@rustbot
Copy link
Collaborator

rustbot commented Mar 12, 2023

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

@cjgillot
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 12, 2023
@bors
Copy link
Contributor

bors commented Mar 12, 2023

⌛ Trying commit 3a4d8fe9baa63dd9e4ffb6fa799b846fd806c00e with merge b6efc766824a4dfccb5f423fd5d3ed04e6329538...

@bors
Copy link
Contributor

bors commented Mar 12, 2023

☀️ Try build successful - checks-actions
Build commit: b6efc766824a4dfccb5f423fd5d3ed04e6329538 (b6efc766824a4dfccb5f423fd5d3ed04e6329538)

@rust-timer

This comment has been minimized.

@Zoxc
Copy link
Contributor

Zoxc commented Mar 12, 2023

This does remove an important assertion which checks for duplicate key hashes. I think it makes sense to keep the map and assertions for debug_assertions. We also should definitely report which dep node was duplicated when loading the dep graph. It could also make sense to have a -Z flag which verifies that the dep graph can be loaded without requiring a separate rustc instance.

I'm not sure why this PR marks all the issues as fixed, as it just shifts the problem to the next session.

@cjgillot
Copy link
Contributor Author

This does remove an important assertion which checks for duplicate key hashes. I think it makes sense to keep the map and assertions for debug_assertions. We also should definitely report which dep node was duplicated when loading the dep graph. It could also make sense to have a -Z flag which verifies that the dep graph can be loaded without requiring a separate rustc instance.

Having duplicates among dep-nodes is not really an issue for the current compilation session. This is only unsupported to deserialize it. I can definitely add a diagnostic to report which node was duplicated.

I considered keeping that map set of existing nodes. That created a lot of complexity (mostly cfgs), for an unsure gain.

I'm not sure why this PR marks all the issues as fixed, as it just shifts the problem to the next session.

The "ICE", which is the problem from the point of view of the use, is removed. Instead, rustc gracefully recovers, which is IMO a better behaviour. I can also refine the recovery to exclude duplicated nodes, but keep the rest of the graph here.

This gives me another idea: filter anonymous nodes out of the deserialized index, as those cannot have an equivalent in the current session.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b6efc766824a4dfccb5f423fd5d3ed04e6329538): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.2% [-2.9%, -0.2%] 28
Improvements ✅
(secondary)
-1.6% [-4.1%, -0.2%] 29
All ❌✅ (primary) -1.2% [-2.9%, -0.2%] 28

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [2.8%, 2.8%] 1
Regressions ❌
(secondary)
3.2% [2.8%, 3.7%] 2
Improvements ✅
(primary)
-5.6% [-17.5%, -1.9%] 41
Improvements ✅
(secondary)
-7.3% [-12.6%, -1.4%] 19
All ❌✅ (primary) -5.4% [-17.5%, 2.8%] 42

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.3% [-7.5%, -1.0%] 28
Improvements ✅
(secondary)
-6.2% [-12.0%, -1.8%] 16
All ❌✅ (primary) -3.3% [-7.5%, -1.0%] 28

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 12, 2023
@cjgillot
Copy link
Contributor Author

cc @michaelwoerister as you reviewed quite a few PRs on the query system.

@michaelwoerister
Copy link
Member

Thanks for the PR, @cjgillot! I'll take a closer look within the next two days.

@Zoxc
Copy link
Contributor

Zoxc commented Mar 13, 2023

Having duplicates among dep-nodes is not really an issue for the current compilation session.

Yes, it should be correct. I'm just concerned about the hash collisions being harder to reproduce and easier to ignore due as the errors are "recovered". Having a command line option to make these hard errors would be useful for testing on CI and rustc-perf. I'm not sure if we have larger incremental crate tests on CI at all? I'm sure we have some UI tests which this PR would invalidate. Maybe we'd need to change the test runner to do another pass to verify that the dep graph also loads.

The "ICE", which is the problem from the point of view of the use, is removed.

Sure, but we should still track and fix these issues.

@oli-obk
Copy link
Contributor

oli-obk commented Mar 13, 2023

Having a command line option to make these hard errors would be useful for testing on CI and rustc-perf. I

We could also keep ICEing on nightly and beta, and just doing the convenient thing for users on stable.

Either way, this needs a summary of the behaviour changes and a compiler team FCP

@cjgillot
Copy link
Contributor Author

Before going to the FCP part, I went further on the idea of recovering from duplicates.
The idea is that having a duplicated DepNodes in the graph does not mean that we cannot use the graph. It just means that the translation between a DepNodeIndex and a SerializedDepNodeIndex is ambiguous, so we should not do it. This means that a DepNode which was a duplicate in the previous session will be treated as a new node in the current session.

That's what anonymous nodes do, so we can go even further and implement anonymous nodes by filtering them out of the DepNode -> SerializedDepNodeIndex map.

TBH, I'm not sure that we should merge that last commit, so my "actual" PR goes up to the "Always recover from duplicate" commit 76a6d17662e373f779b4cefd9c6e2c8f858f583d.

@michaelwoerister
Copy link
Member

So the current solution was to either ICE when creating duplicates, or deduplicate them silently using the new_node_to_index map.

I'm not sure if silently de-duplicating actually ever happened (except for anonymous nodes). Conceptually it should always be an error if two query invocations map to the same dep-node, right? This is what this assertion is there for. If I remember correctly, this assertion has caught multiple invalid HashStable implementations in the past, so I am pretty skeptical about removing it altogether.

I'm definitely in favor of reducing the dep-graphs memory footprint, however 🙂

Is there a way to keep the sanity checks but do them in a less expensive way? I think the check during deserialization is pretty clever, but it does make it harder to find out where the problem is.

@michaelwoerister
Copy link
Member

The idea is that having a duplicated DepNodes in the graph does not mean that we cannot use the graph.

I don't think I follow. As I understand it, there should never be duplicate dep-nodes to begin with. The only exception are anonymous nodes where we introduce them explicitly, but otherwise we expect to have a 1:1 correspondence between DepNodes and query keys, right?

@cjgillot
Copy link
Contributor Author

So the current solution was to either ICE when creating duplicates, or deduplicate them silently using the new_node_to_index map.

I'm not sure if silently de-duplicating actually ever happened (except for anonymous nodes). Conceptually it should always be an error if two query invocations map to the same dep-node, right? This is what this assertion is there for. If I remember correctly, this assertion has caught multiple invalid HashStable implementations in the past, so I am pretty skeptical about removing it altogether.

I'm definitely in favor of reducing the dep-graphs memory footprint, however slightly_smiling_face

Is there a way to keep the sanity checks but do them in a less expensive way? I think the check during deserialization is pretty clever, but it does make it harder to find out where the problem is.

I'm not sure how we can have both. Checking duplicates needs to have everything in memory at the same time somehow.

The idea is that having a duplicated DepNodes in the graph does not mean that we cannot use the graph.

I don't think I follow. As I understand it, there should never be duplicate dep-nodes to begin with. The only exception are anonymous nodes where we introduce them explicitly, but otherwise we expect to have a 1:1 correspondence between DepNodes and query keys, right?

There are 2 ways to see this.

The current understanding is that we have a graph where the DepNodes are the nodes, and we promote the nodes from the previous graph to the current graph.

Another understanding is that we have 2 graphs where the SerializedDepNodeIndex and the DepNodeIndex are the nodes. Evidence of this interpretation is that the red/green marking logic does not need to work with DepNode, just with SerializedDepNodeIndex. The DepNodes are a rosetta stone to translate between the two set of indices. As such, we don't need this rosetta stone to be complete or unambiguous: in case of ambiguity, we just don't have the relationship {Serialized,}DepNodeIndex. In practice, it is unambiguous, but in theory, we don't need it to be.

@michaelwoerister
Copy link
Member

I'm not sure how we can have both. Checking duplicates needs to have everything in memory at the same time somehow.

Maybe it would make sense to put the check behind a -Z flag and only populate the map when the check is enabled? Then if we get a report that something is wrong (because of the cheaper check during deserialization) we can find the root cause more easily.

@bors
Copy link
Contributor

bors commented Mar 20, 2023

☔ The latest upstream changes (presumably #108524) made this pull request unmergeable. Please resolve the merge conflicts.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Mar 27, 2023
…=Nilstrieb

Use an IndexVec to debug fingerprints.

Uncontroversial part of rust-lang#109050
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 12, 2023
@Zoxc
Copy link
Contributor

Zoxc commented May 12, 2023

I think I'd prefer the behavior of having an ICE when we detect duplicates, but also erasing the incremental cache, allowing the next compilation session to succeed. That would be harder to miss than a warning for users.

@michaelwoerister
Copy link
Member

I think I'd prefer the behavior of having an ICE when we detect duplicates, but also erasing the incremental cache, allowing the next compilation session to succeed. That would be harder to miss than a warning for users.

That sounds reasonable to me. Although I'm wondering if that would get users stuck in an "ICE loop", where compilation sessions alternate between ICEing and succeeding as long as the problematic piece of code exists (because the conditions for DepNode identifier collisions are inherent to the program being compiled).

But maybe the ICE message could contain an explanation of the problem and instructions on how to turn off incremental compilation?

@cjgillot
Copy link
Contributor Author

cjgillot commented Jun 4, 2023

Another understanding is that we have 2 graphs where the SerializedDepNodeIndex and the DepNodeIndex are the nodes. Evidence of this interpretation is that the red/green marking logic does not need to work with DepNode, just with SerializedDepNodeIndex.

I don't think I understand this. try_mark_green maps from DepNode to SerializedDepNodeIndex and then from SerializedDepNodeIndex to DepNode a number of times. How else would we correlate things between the compilation sessions?

It does not. It only maps from DepNode to SerializedDepNodeIndex once on entry. The graph walk only uses SerializedDepNodeIndex. It only fetches the DepNode for debugging and to test for eval always / forcable query. It does not use those DepNode to map back to an index, or to a color, this is done directly using the SerializedDepNodeIndex.

I think I'd prefer the behavior of having an ICE when we detect duplicates, but also erasing the incremental cache, allowing the next compilation session to succeed. That would be harder to miss than a warning for users.

That sounds reasonable to me. Although I'm wondering if that would get users stuck in an "ICE loop", where compilation sessions alternate between ICEing and succeeding as long as the problematic piece of code exists (because the conditions for DepNode identifier collisions are inherent to the program being compiled).

But maybe the ICE message could contain an explanation of the problem and instructions on how to turn off incremental compilation?

My proposal is to stop ICEing in both cases. I don't understand how that's worse.

@michaelwoerister
Copy link
Member

It only fetches the DepNode for debugging and to test for eval always / forcable query

Isn't this a place where things can go wrong? try_mark_parent_green takes the (potentially conflicting) DepNode from the previous dep-graph and then calls try_force_from_dep_node. That, in turn, will try to reconstruct the query key from the Fingerprint contained in the DepNode -- but we don't have a guarantee anymore that that Fingerprint uniquely identifies the query key.

What do you think about focusing this PR on getting rid of the big CurrentDepGraph::new_node_to_index map and addressing DepNode conflicts separately?

@cjgillot
Copy link
Contributor Author

cjgillot commented Jun 9, 2023

Isn't this a place where things can go wrong? try_mark_parent_green takes the (potentially conflicting) DepNode from the previous dep-graph and then calls try_force_from_dep_node. That, in turn, will try to reconstruct the query key from the Fingerprint contained in the DepNode -- but we don't have a guarantee anymore that that Fingerprint uniquely identifies the query key.

For try_force_from_dep_node, we require the key to be reconstructible from the hash. This is opt-in, and requires to have a proper mapping from fingerprint to key in DepNodeParams::recover. So we can't have any conflict there.

What do you think about focusing this PR on getting rid of the big CurrentDepGraph::new_node_to_index map and addressing DepNode conflicts separately?

Split in #112469

@michaelwoerister
Copy link
Member

Thanks for splitting out #112469!

I think the other changes in this PR could basically be seen as a proposal to change how DepNodes are defined. Right now a (non-anonymous) DepNode represents a single query invocation, that is, there is a 1:1 mapping between DepNodes and query keys. Under that paradigm, the fact that DepNodes internally contain a fingerprint is just an implementation detail. We require these fingerprints to be effectively unique -- or we would have to replace them with something else (like a serialized version of the corresponding query key; which is how they actually were implemented in early versions of the system).

The contested changes in this PR, however, in a way amount to making DepNodes a "best-effort" concept: except for DepNode kinds where we want to reconstruct the query key, the system only assumes that DepNodes are a hint for finding the right SerializedDepNodeIndex when invoking try_mark_green. But otherwise, everything is expected to work even if multiple nodes in the dep-graph have the same DepNode ID/Hash.

I think this is really interesting! It would potentially enabled a number of things:

  • Maybe we don't want to hash certain query keys at all, since they are rarely used by the initial try_mark_green() call. Or because the query keys are notoriously hard to hash in a stable way.
  • Maybe we can use a smaller hash, since collisions are not a correctness problem anymore.
  • Maybe we don't want to use a hash/fingerprint for reconstructible dep-nodes because we only plan to support DefId, HirId, (), integers, etc, and those can be serialized verbatim.

So, overall, I think this is really interesting and promising. I just don't think it would be good form to implement a rather fundamental shift like this "under that radar" as a drive-by fix and without updating our documentation, right?

@bors
Copy link
Contributor

bors commented Sep 7, 2023

☔ The latest upstream changes (presumably #110050) made this pull request unmergeable. Please resolve the merge conflicts.

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 6, 2023
…ingerprints, r=<try>

Experiment: Only track fingerprints for queries with reconstructible dep-nodes.

This is an experiment to collect performance data about alternative ways to adapt rust-lang#109050. The PR makes the following change:

All queries with keys that are not reconstructible from their corresponding DepNode are now treated similar to anonymous queries. That is we don't compute a DepNode or result fingerprint for them.

This has some implications:
- We save time because query keys and results don't have to be hashed.
- We can save space storing less data for these nodes in the on-disk dep-graph. (not implemented in this PR as I ran out of time. Maybe this would be a quick fix for `@saethlin` though?)
- We don't have to worry about hash collisions for DepNode in these cases (although we still have to worry about hash collisions for result fingerprints, which might include all the same HashStable impls)
- Same as with anonymous queries, the graph can grow additional nodes and edges in some situations because existing graph parts might be promoted while new parts are allocated for the same query if it is re-executed. I don't know how much this happens in practice.
- We cannot cache query results for queries with complex keys.

Given that that last point affects some heavy queries, I have my doubts that this strategy is a win. But let's run it through perf at least once.

cc `@cjgillot,` `@Zoxc`

r? `@ghost`
@rust-log-analyzer

This comment has been minimized.

@apiraino
Copy link
Contributor

Hi @cjgillot ! What's the status of this? I see it's still tagged that it needs an FCP, is it the current status? Thanks if you can post an update (or if you still have interest/bandwidth to pursuit this goal) :-)

@apiraino
Copy link
Contributor

Removing a review assignee for now, waiting to clarify the status

@bors
Copy link
Contributor

bors commented Nov 12, 2024

☔ The latest upstream changes (presumably #132282) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) needs-fcp This change is insta-stable, so needs a completed FCP to proceed. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
10 participants