Only use the new DepNode hashmap for anonymous nodes. #109050

cjgillot · 2023-03-12T11:36:29Z

Profiling the compilation of the windows crate with massif showed that a hashmap DepNode -> DepNodeIndex accounted for 6Gb in the peak memory usage for the windows crate.

This PR aims to remove this map.

There are 3 hashmaps of DepNode in the query system:

the index from the serialized graph;
the fingerprints map which is used for debugging.
the new_node_to_index map which is used to deduplicate nodes and check for their existence.

Hashmap 1 is unavoidable, as this is the only way to make a correspondence between two compilation sessions.

The second commit replaces hashmap 2 by a simple IndexVec<DepNodeIndex, Option<Fingerprint>>.

Hashmap 2 is the more interesting. Having duplicate DepNodes in the serialized graph is not supported. So the current solution was to either ICE when creating duplicates, or deduplicate them silently using the new_node_to_index map.
The third commit moves the burden of checking for duplicates to dep-graph deserialization, which fails when there are duplicates. Instead of an ICE, we silently clear the incremental session, and continue compilation.
The remaining source of duplicates if the creation of anonymous nodes, for which the DepNode is just the hash of the dependencies' indices. The new_node_to_index map is shrunk to only be used for such anonymous nodes.

Those changes allow to go from 21 Gb to 16 Gb peak memory usage on that crate.

Drive-by: the first commits fixes #101518 by marking the affected query as anonymous. As we remove the check for duplicated DepNodes, the ICE would have been replaced by a silent clearing of the incremental state, which would have been unfortunate.

https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/improving.20rustc.20memory.20usage

Fixes #83085
Fixes #101518
Fixes #106136
Fixes #107991
Fixes #108657

rustbot · 2023-03-12T11:36:34Z

r? @oli-obk

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2023-03-12T11:36:37Z

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

cjgillot · 2023-03-12T11:39:23Z

@bors try @rust-timer queue

bors · 2023-03-12T11:39:32Z

⌛ Trying commit 3a4d8fe9baa63dd9e4ffb6fa799b846fd806c00e with merge b6efc766824a4dfccb5f423fd5d3ed04e6329538...

bors · 2023-03-12T14:31:57Z

☀️ Try build successful - checks-actions
Build commit: b6efc766824a4dfccb5f423fd5d3ed04e6329538 (b6efc766824a4dfccb5f423fd5d3ed04e6329538)

Zoxc · 2023-03-12T17:25:32Z

This does remove an important assertion which checks for duplicate key hashes. I think it makes sense to keep the map and assertions for debug_assertions. We also should definitely report which dep node was duplicated when loading the dep graph. It could also make sense to have a -Z flag which verifies that the dep graph can be loaded without requiring a separate rustc instance.

I'm not sure why this PR marks all the issues as fixed, as it just shifts the problem to the next session.

cjgillot · 2023-03-12T17:52:21Z

This does remove an important assertion which checks for duplicate key hashes. I think it makes sense to keep the map and assertions for debug_assertions. We also should definitely report which dep node was duplicated when loading the dep graph. It could also make sense to have a -Z flag which verifies that the dep graph can be loaded without requiring a separate rustc instance.

Having duplicates among dep-nodes is not really an issue for the current compilation session. This is only unsupported to deserialize it. I can definitely add a diagnostic to report which node was duplicated.

I considered keeping that map set of existing nodes. That created a lot of complexity (mostly cfgs), for an unsure gain.

I'm not sure why this PR marks all the issues as fixed, as it just shifts the problem to the next session.

The "ICE", which is the problem from the point of view of the use, is removed. Instead, rustc gracefully recovers, which is IMO a better behaviour. I can also refine the recovery to exclude duplicated nodes, but keep the rest of the graph here.

This gives me another idea: filter anonymous nodes out of the deserialized index, as those cannot have an equivalent in the current session.

rust-timer · 2023-03-12T23:43:54Z

Finished benchmarking commit (b6efc766824a4dfccb5f423fd5d3ed04e6329538): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.2%	[-2.9%, -0.2%]	28
Improvements ✅ (secondary)	-1.6%	[-4.1%, -0.2%]	29
All ❌✅ (primary)	-1.2%	[-2.9%, -0.2%]	28

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.8%	[2.8%, 2.8%]	1
Regressions ❌ (secondary)	3.2%	[2.8%, 3.7%]	2
Improvements ✅ (primary)	-5.6%	[-17.5%, -1.9%]	41
Improvements ✅ (secondary)	-7.3%	[-12.6%, -1.4%]	19
All ❌✅ (primary)	-5.4%	[-17.5%, 2.8%]	42

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.3%	[-7.5%, -1.0%]	28
Improvements ✅ (secondary)	-6.2%	[-12.0%, -1.8%]	16
All ❌✅ (primary)	-3.3%	[-7.5%, -1.0%]	28

cjgillot · 2023-03-13T11:42:54Z

cc @michaelwoerister as you reviewed quite a few PRs on the query system.

michaelwoerister · 2023-03-13T14:05:39Z

Thanks for the PR, @cjgillot! I'll take a closer look within the next two days.

Zoxc · 2023-03-13T17:05:42Z

Having duplicates among dep-nodes is not really an issue for the current compilation session.

Yes, it should be correct. I'm just concerned about the hash collisions being harder to reproduce and easier to ignore due as the errors are "recovered". Having a command line option to make these hard errors would be useful for testing on CI and rustc-perf. I'm not sure if we have larger incremental crate tests on CI at all? I'm sure we have some UI tests which this PR would invalidate. Maybe we'd need to change the test runner to do another pass to verify that the dep graph also loads.

The "ICE", which is the problem from the point of view of the use, is removed.

Sure, but we should still track and fix these issues.

oli-obk · 2023-03-13T18:07:54Z

Having a command line option to make these hard errors would be useful for testing on CI and rustc-perf. I

We could also keep ICEing on nightly and beta, and just doing the convenient thing for users on stable.

Either way, this needs a summary of the behaviour changes and a compiler team FCP

cjgillot · 2023-03-13T20:19:34Z

Before going to the FCP part, I went further on the idea of recovering from duplicates.
The idea is that having a duplicated DepNodes in the graph does not mean that we cannot use the graph. It just means that the translation between a DepNodeIndex and a SerializedDepNodeIndex is ambiguous, so we should not do it. This means that a DepNode which was a duplicate in the previous session will be treated as a new node in the current session.

That's what anonymous nodes do, so we can go even further and implement anonymous nodes by filtering them out of the DepNode -> SerializedDepNodeIndex map.

TBH, I'm not sure that we should merge that last commit, so my "actual" PR goes up to the "Always recover from duplicate" commit 76a6d17662e373f779b4cefd9c6e2c8f858f583d.

michaelwoerister · 2023-03-15T14:40:56Z

So the current solution was to either ICE when creating duplicates, or deduplicate them silently using the new_node_to_index map.

I'm not sure if silently de-duplicating actually ever happened (except for anonymous nodes). Conceptually it should always be an error if two query invocations map to the same dep-node, right? This is what this assertion is there for. If I remember correctly, this assertion has caught multiple invalid HashStable implementations in the past, so I am pretty skeptical about removing it altogether.

I'm definitely in favor of reducing the dep-graphs memory footprint, however 🙂

Is there a way to keep the sanity checks but do them in a less expensive way? I think the check during deserialization is pretty clever, but it does make it harder to find out where the problem is.

michaelwoerister · 2023-03-15T14:46:49Z

The idea is that having a duplicated DepNodes in the graph does not mean that we cannot use the graph.

I don't think I follow. As I understand it, there should never be duplicate dep-nodes to begin with. The only exception are anonymous nodes where we introduce them explicitly, but otherwise we expect to have a 1:1 correspondence between DepNodes and query keys, right?

cjgillot · 2023-03-15T17:53:53Z

So the current solution was to either ICE when creating duplicates, or deduplicate them silently using the new_node_to_index map.

I'm not sure if silently de-duplicating actually ever happened (except for anonymous nodes). Conceptually it should always be an error if two query invocations map to the same dep-node, right? This is what this assertion is there for. If I remember correctly, this assertion has caught multiple invalid HashStable implementations in the past, so I am pretty skeptical about removing it altogether.

I'm definitely in favor of reducing the dep-graphs memory footprint, however slightly_smiling_face

Is there a way to keep the sanity checks but do them in a less expensive way? I think the check during deserialization is pretty clever, but it does make it harder to find out where the problem is.

I'm not sure how we can have both. Checking duplicates needs to have everything in memory at the same time somehow.

The idea is that having a duplicated DepNodes in the graph does not mean that we cannot use the graph.

I don't think I follow. As I understand it, there should never be duplicate dep-nodes to begin with. The only exception are anonymous nodes where we introduce them explicitly, but otherwise we expect to have a 1:1 correspondence between DepNodes and query keys, right?

There are 2 ways to see this.

The current understanding is that we have a graph where the DepNodes are the nodes, and we promote the nodes from the previous graph to the current graph.

Another understanding is that we have 2 graphs where the SerializedDepNodeIndex and the DepNodeIndex are the nodes. Evidence of this interpretation is that the red/green marking logic does not need to work with DepNode, just with SerializedDepNodeIndex. The DepNodes are a rosetta stone to translate between the two set of indices. As such, we don't need this rosetta stone to be complete or unambiguous: in case of ambiguity, we just don't have the relationship {Serialized,}DepNodeIndex. In practice, it is unambiguous, but in theory, we don't need it to be.

michaelwoerister · 2023-03-16T14:22:08Z

I'm not sure how we can have both. Checking duplicates needs to have everything in memory at the same time somehow.

Maybe it would make sense to put the check behind a -Z flag and only populate the map when the check is enabled? Then if we get a report that something is wrong (because of the cheaper check during deserialization) we can find the root cause more easily.

bors · 2023-03-20T15:26:55Z

☔ The latest upstream changes (presumably #108524) made this pull request unmergeable. Please resolve the merge conflicts.

…=Nilstrieb Use an IndexVec to debug fingerprints. Uncontroversial part of rust-lang#109050

cjgillot · 2023-06-04T09:33:44Z

Another understanding is that we have 2 graphs where the SerializedDepNodeIndex and the DepNodeIndex are the nodes. Evidence of this interpretation is that the red/green marking logic does not need to work with DepNode, just with SerializedDepNodeIndex.

I don't think I understand this. try_mark_green maps from DepNode to SerializedDepNodeIndex and then from SerializedDepNodeIndex to DepNode a number of times. How else would we correlate things between the compilation sessions?

It does not. It only maps from DepNode to SerializedDepNodeIndex once on entry. The graph walk only uses SerializedDepNodeIndex. It only fetches the DepNode for debugging and to test for eval always / forcable query. It does not use those DepNode to map back to an index, or to a color, this is done directly using the SerializedDepNodeIndex.

I think I'd prefer the behavior of having an ICE when we detect duplicates, but also erasing the incremental cache, allowing the next compilation session to succeed. That would be harder to miss than a warning for users.

That sounds reasonable to me. Although I'm wondering if that would get users stuck in an "ICE loop", where compilation sessions alternate between ICEing and succeeding as long as the problematic piece of code exists (because the conditions for DepNode identifier collisions are inherent to the program being compiled).

But maybe the ICE message could contain an explanation of the problem and instructions on how to turn off incremental compilation?

My proposal is to stop ICEing in both cases. I don't understand how that's worse.

michaelwoerister · 2023-06-06T12:50:39Z

It only fetches the DepNode for debugging and to test for eval always / forcable query

Isn't this a place where things can go wrong? try_mark_parent_green takes the (potentially conflicting) DepNode from the previous dep-graph and then calls try_force_from_dep_node. That, in turn, will try to reconstruct the query key from the Fingerprint contained in the DepNode -- but we don't have a guarantee anymore that that Fingerprint uniquely identifies the query key.

What do you think about focusing this PR on getting rid of the big CurrentDepGraph::new_node_to_index map and addressing DepNode conflicts separately?

cjgillot · 2023-06-09T17:09:46Z

Isn't this a place where things can go wrong? try_mark_parent_green takes the (potentially conflicting) DepNode from the previous dep-graph and then calls try_force_from_dep_node. That, in turn, will try to reconstruct the query key from the Fingerprint contained in the DepNode -- but we don't have a guarantee anymore that that Fingerprint uniquely identifies the query key.

For try_force_from_dep_node, we require the key to be reconstructible from the hash. This is opt-in, and requires to have a proper mapping from fingerprint to key in DepNodeParams::recover. So we can't have any conflict there.

What do you think about focusing this PR on getting rid of the big CurrentDepGraph::new_node_to_index map and addressing DepNode conflicts separately?

Split in #112469

michaelwoerister · 2023-06-12T10:52:50Z

Thanks for splitting out #112469!

I think the other changes in this PR could basically be seen as a proposal to change how DepNodes are defined. Right now a (non-anonymous) DepNode represents a single query invocation, that is, there is a 1:1 mapping between DepNodes and query keys. Under that paradigm, the fact that DepNodes internally contain a fingerprint is just an implementation detail. We require these fingerprints to be effectively unique -- or we would have to replace them with something else (like a serialized version of the corresponding query key; which is how they actually were implemented in early versions of the system).

The contested changes in this PR, however, in a way amount to making DepNodes a "best-effort" concept: except for DepNode kinds where we want to reconstruct the query key, the system only assumes that DepNodes are a hint for finding the right SerializedDepNodeIndex when invoking try_mark_green. But otherwise, everything is expected to work even if multiple nodes in the dep-graph have the same DepNode ID/Hash.

I think this is really interesting! It would potentially enabled a number of things:

Maybe we don't want to hash certain query keys at all, since they are rarely used by the initial try_mark_green() call. Or because the query keys are notoriously hard to hash in a stable way.
Maybe we can use a smaller hash, since collisions are not a correctness problem anymore.
Maybe we don't want to use a hash/fingerprint for reconstructible dep-nodes because we only plan to support DefId, HirId, (), integers, etc, and those can be serialized verbatim.

So, overall, I think this is really interesting and promising. I just don't think it would be good form to implement a rather fundamental shift like this "under that radar" as a drive-by fix and without updating our documentation, right?

bors · 2023-09-07T04:35:39Z

☔ The latest upstream changes (presumably #110050) made this pull request unmergeable. Please resolve the merge conflicts.

…ingerprints, r=<try> Experiment: Only track fingerprints for queries with reconstructible dep-nodes. This is an experiment to collect performance data about alternative ways to adapt rust-lang#109050. The PR makes the following change: All queries with keys that are not reconstructible from their corresponding DepNode are now treated similar to anonymous queries. That is we don't compute a DepNode or result fingerprint for them. This has some implications: - We save time because query keys and results don't have to be hashed. - We can save space storing less data for these nodes in the on-disk dep-graph. (not implemented in this PR as I ran out of time. Maybe this would be a quick fix for `@saethlin` though?) - We don't have to worry about hash collisions for DepNode in these cases (although we still have to worry about hash collisions for result fingerprints, which might include all the same HashStable impls) - Same as with anonymous queries, the graph can grow additional nodes and edges in some situations because existing graph parts might be promoted while new parts are allocated for the same query if it is re-executed. I don't know how much this happens in practice. - We cannot cache query results for queries with complex keys. Given that that last point affects some heavy queries, I have my doubts that this strategy is a win. But let's run it through perf at least once. cc `@cjgillot,` `@Zoxc` r? `@ghost`

apiraino · 2024-10-31T11:16:40Z

Hi @cjgillot ! What's the status of this? I see it's still tagged that it needs an FCP, is it the current status? Thanks if you can post an update (or if you still have interest/bandwidth to pursuit this goal) :-)

apiraino · 2024-10-31T11:24:30Z

Removing a review assignee for now, waiting to clarify the status

bors · 2024-11-12T18:08:13Z

☔ The latest upstream changes (presumably #132282) made this pull request unmergeable. Please resolve the merge conflicts.

Always recover from duplicate DepNode.

As the hash function for the input is not complete, those queries can trigger a re-use of a DepNode. As we now tolerate having duplicated DepNode, we effectively remove this ICE.

bors · 2025-01-28T12:29:41Z

☔ The latest upstream changes (presumably #133929) made this pull request unmergeable. Please resolve the merge conflicts.

cjgillot · 2025-07-03T14:47:55Z

The most interesting changes were merged in #138629

rustbot assigned oli-obk Mar 12, 2023

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 12, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 12, 2023

cjgillot mentioned this pull request Mar 25, 2023

Use an IndexVec to debug fingerprints. #109587

Merged

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Mar 27, 2023

Rollup merge of rust-lang#109587 - cjgillot:no-hashmap-fingerprint, r…

6371b91

…=Nilstrieb Use an IndexVec to debug fingerprints. Uncontroversial part of rust-lang#109050

cjgillot mentioned this pull request Jun 9, 2023

Only use the new node hashmap for anonymous nodes. #112469

Closed

cjgillot force-pushed the no-hash-graph branch from 5c4f06b to 3eef8ed Compare June 9, 2023 17:00

michaelwoerister mentioned this pull request Dec 6, 2023

Experiment: Only track fingerprints for queries with reconstructible dep-nodes. #118667

Closed

bjorn3 mentioned this pull request Dec 26, 2023

make incremental ICEs more informative #119317

Open

cjgillot force-pushed the no-hash-graph branch from 3eef8ed to 381c7a3 Compare August 10, 2024 12:49

This comment has been minimized.

Sign in to view

cjgillot force-pushed the no-hash-graph branch from 381c7a3 to 99ecfbc Compare August 18, 2024 11:35

apiraino unassigned michaelwoerister Oct 31, 2024

cjgillot added 8 commits November 27, 2024 01:15

Verify dep-graph consistency when loading it.

d51f5cf

Always recover from duplicate DepNode.

Only use the new node hashmap for anonymous nodes.

d3bcb8f

Bless test for constant queries.

db353ca

As the hash function for the input is not complete, those queries can trigger a re-use of a DepNode. As we now tolerate having duplicated DepNode, we effectively remove this ICE.

Add a flag to check depnodes for collisions.

545ec63

Reintroduce assertion.

1851fc4

Reuse -Zincremental_verify_ich.

ddcc71b

Do not fetch the DepNode to mark nodes green.

694b773

Silence lint.

27568ed

cjgillot force-pushed the no-hash-graph branch from 99ecfbc to 27568ed Compare December 2, 2024 23:23

cjgillot closed this Jul 3, 2025

cjgillot deleted the no-hash-graph branch July 3, 2025 14:48

Only use the new DepNode hashmap for anonymous nodes. #109050

Only use the new DepNode hashmap for anonymous nodes. #109050

Uh oh!

Conversation

cjgillot commented Mar 12, 2023

Uh oh!

rustbot commented Mar 12, 2023

Uh oh!

rustbot commented Mar 12, 2023

Uh oh!

cjgillot commented Mar 12, 2023

Uh oh!

This comment has been minimized.

bors commented Mar 12, 2023

Uh oh!

bors commented Mar 12, 2023

Uh oh!

This comment has been minimized.

Zoxc commented Mar 12, 2023

Uh oh!

cjgillot commented Mar 12, 2023

Uh oh!

rust-timer commented Mar 12, 2023

Overall result: ✅ improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Uh oh!

cjgillot commented Mar 13, 2023

Uh oh!

michaelwoerister commented Mar 13, 2023

Uh oh!

Zoxc commented Mar 13, 2023

Uh oh!

oli-obk commented Mar 13, 2023

Uh oh!

cjgillot commented Mar 13, 2023

Uh oh!

michaelwoerister commented Mar 15, 2023

Uh oh!

michaelwoerister commented Mar 15, 2023

Uh oh!

cjgillot commented Mar 15, 2023

Uh oh!

michaelwoerister commented Mar 16, 2023

Uh oh!

bors commented Mar 20, 2023

Uh oh!

cjgillot commented Jun 4, 2023

Uh oh!

michaelwoerister commented Jun 6, 2023

Uh oh!

cjgillot commented Jun 9, 2023

Uh oh!

michaelwoerister commented Jun 12, 2023

Uh oh!

bors commented Sep 7, 2023

Uh oh!

This comment has been minimized.

apiraino commented Oct 31, 2024

Uh oh!

apiraino commented Oct 31, 2024

Uh oh!

bors commented Nov 12, 2024

Uh oh!

bors commented Jan 28, 2025

Uh oh!

cjgillot commented Jul 3, 2025

Uh oh!

Uh oh!