Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recursively evaluate the constants in everything that is 'mentioned' #122568

Merged
merged 10 commits into from
Mar 21, 2024

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Mar 15, 2024

This is another attempt at fixing #107503. The previous attempt at #112879 seems stuck in figuring out where the perf regression comes from. In #122258 I learned some things, which informed the approach this PR is taking.

Quoting from the new collector docs, which explain the high-level idea:

//! One important role of collection is to evaluate all constants that are used by all the items
//! which are being collected. Codegen can then rely on only encountering constants that evaluate
//! successfully, and if a constant fails to evaluate, the collector has much better context to be
//! able to show where this constant comes up.
//!
//! However, the exact set of "used" items (collected as described above), and therefore the exact
//! set of used constants, can depend on optimizations. Optimizing away dead code may optimize away
//! a function call that uses a failing constant, so an unoptimized build may fail where an
//! optimized build succeeds. This is undesirable.
//!
//! To fix this, the collector has the concept of "mentioned" items. Some time during the MIR
//! pipeline, before any optimization-level-dependent optimizations, we compute a list of all items
//! that syntactically appear in the code. These are considered "mentioned", and even if they are in
//! dead code and get optimized away (which makes them no longer "used"), they are still
//! "mentioned". For every used item, the collector ensures that all mentioned items, recursively,
//! do not use a failing constant. This is reflected via the [`CollectionMode`], which determines
//! whether we are visiting a used item or merely a mentioned item.
//!
//! The collector and "mentioned items" gathering (which lives in `rustc_mir_transform::mentioned_items`)
//! need to stay in sync in the following sense:
//!
//! - For every item that the collector gather that could eventually lead to build failure (most
//!   likely due to containing a constant that fails to evaluate), a corresponding mentioned item
//!   must be added. This should use the exact same strategy as the ecollector to make sure they are
//!   in sync. However, while the collector works on monomorphized types, mentioned items are
//!   collected on generic MIR -- so any time the collector checks for a particular type (such as
//!   `ty::FnDef`), we have to just onconditionally add this as a mentioned item.
//! - In `visit_mentioned_item`, we then do with that mentioned item exactly what the collector
//!   would have done during regular MIR visiting. Basically you can think of the collector having
//!   two stages, a pre-monomorphization stage and a post-monomorphization stage (usually quite
//!   literally separated by a call to `self.monomorphize`); the pre-monomorphizationn stage is
//!   duplicated in mentioned items gathering and the post-monomorphization stage is duplicated in
//!   `visit_mentioned_item`.
//! - Finally, as a performance optimization, the collector should fill `used_mentioned_item` during
//!   its MIR traversal with exactly what mentioned item gathering would have added in the same
//!   situation. This detects mentioned items that have *not* been optimized away and hence don't
//!   need a dedicated traversal.

enum CollectionMode {
    /// Collect items that are used, i.e., actually needed for codegen.
    ///
    /// Which items are used can depend on optimization levels, as MIR optimizations can remove
    /// uses.
    UsedItems,
    /// Collect items that are mentioned. The goal of this mode is that it is independent of
    /// optimizations: the set of "mentioned" items is computed before optimizations are run.
    ///
    /// The exact contents of this set are *not* a stable guarantee. (For instance, it is currently
    /// computed after drop-elaboration. If we ever do some optimizations even in debug builds, we
    /// might decide to run them before computing mentioned items.) The key property of this set is
    /// that it is optimization-independent.
    MentionedItems,
}

And the mentioned_items MIR body field docs:

    /// Further items that were mentioned in this function and hence *may* become monomorphized,
    /// depending on optimizations. We use this to avoid optimization-dependent compile errors: the
    /// collector recursively traverses all "mentioned" items and evaluates all their
    /// `required_consts`.
    ///
    /// This is *not* soundness-critical and the contents of this list are *not* a stable guarantee.
    /// All that's relevant is that this set is optimization-level-independent, and that it includes
    /// everything that the collector would consider "used". (For example, we currently compute this
    /// set after drop elaboration, so some drop calls that can never be reached are not considered
    /// "mentioned".) See the documentation of `CollectionMode` in
    /// `compiler/rustc_monomorphize/src/collector.rs` for more context.
    pub mentioned_items: Vec<Spanned<MentionedItem<'tcx>>>,

Fixes #107503

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 15, 2024
@RalfJung RalfJung changed the title Draft: recursively evaluate their constants in everything that is 'mentioned' Draft: recursively evaluate the constants in everything that is 'mentioned' Mar 15, 2024
@RalfJung
Copy link
Member Author

Okay so this should hopefully codegen nothing more than what was already codegen'd before this PR... only the collector and MIR passes are doing more work.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 15, 2024
@bors
Copy link
Contributor

bors commented Mar 15, 2024

⌛ Trying commit 80f6186 with merge 55a1404...

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 15, 2024
Draft: recursively evaluate the constants in everything that is 'mentioned'

This is another attempt at fixing rust-lang#107503. The previous attempt at rust-lang#112879 seems stuck in figuring out where the [perf regression](https://perf.rust-lang.org/compare.html?start=c55d1ee8d4e3162187214692229a63c2cc5e0f31&end=ec8de1ebe0d698b109beeaaac83e60f4ef8bb7d1&stat=instructions:u) comes from. In  rust-lang#122258 I learned some things, which informed the approach this PR is taking.

r? `@ghost`
@bors
Copy link
Contributor

bors commented Mar 15, 2024

☀️ Try build successful - checks-actions
Build commit: 55a1404 (55a14041c399a47a8e64ddf335e32efc0ec49564)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (55a1404): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.6% [0.2%, 4.5%] 67
Regressions ❌
(secondary)
1.0% [0.2%, 3.4%] 18
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.4% [-6.6%, -0.3%] 3
All ❌✅ (primary) 1.6% [0.2%, 4.5%] 67

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.9% [0.4%, 4.8%] 38
Regressions ❌
(secondary)
2.7% [1.6%, 3.9%] 20
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.3% [-4.3%, -4.3%] 1
All ❌✅ (primary) 1.9% [0.4%, 4.8%] 38

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.2% [1.1%, 4.6%] 36
Regressions ❌
(secondary)
2.9% [2.5%, 3.4%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.4% [-4.3%, -2.5%] 2
All ❌✅ (primary) 2.2% [1.1%, 4.6%] 36

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.3% [0.0%, 6.0%] 114
Regressions ❌
(secondary)
1.2% [0.0%, 3.7%] 49
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.3% [0.0%, 6.0%] 114

Bootstrap: 669.355s -> 671.716s (0.35%)
Artifact size: 311.47 MiB -> 311.55 MiB (0.03%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 15, 2024
@RalfJung
Copy link
Member Author

Well that looks a lot better. :) And almost all regressions are for "incr". That makes sense, as collection is re-done when anything changes (right?) so making collection a bit slower would be a much bigger absolute change when the build is fast due to being mostly cached.

@RalfJung RalfJung force-pushed the mentioned-items branch 3 times, most recently from e37145c to dd9cda6 Compare March 17, 2024 08:54
@RalfJung
Copy link
Member Author

I've added an optimization that should help with the common case of "most 'mentioned items' are also still used". Let's see if that makes a difference.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 17, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 17, 2024
Draft: recursively evaluate the constants in everything that is 'mentioned'

This is another attempt at fixing rust-lang#107503. The previous attempt at rust-lang#112879 seems stuck in figuring out where the [perf regression](https://perf.rust-lang.org/compare.html?start=c55d1ee8d4e3162187214692229a63c2cc5e0f31&end=ec8de1ebe0d698b109beeaaac83e60f4ef8bb7d1&stat=instructions:u) comes from. In  rust-lang#122258 I learned some things, which informed the approach this PR is taking.

r? `@ghost`
@bors
Copy link
Contributor

bors commented Mar 17, 2024

⌛ Trying commit dd9cda6 with merge 02d1d38...

@bors
Copy link
Contributor

bors commented Mar 17, 2024

☀️ Try build successful - checks-actions
Build commit: 02d1d38 (02d1d38e5c2ce4f8986df3c3be1eea8e29d6199d)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (02d1d38): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.2% [0.2%, 2.9%] 64
Regressions ❌
(secondary)
0.8% [0.2%, 2.8%] 15
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.2% [0.2%, 2.9%] 64

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.3% [0.6%, 2.7%] 7
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.1% [-2.1%, -2.0%] 3
Improvements ✅
(secondary)
-2.7% [-3.3%, -2.2%] 5
All ❌✅ (primary) 0.3% [-2.1%, 2.7%] 10

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [0.8%, 3.2%] 32
Regressions ❌
(secondary)
3.0% [2.0%, 3.7%] 5
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.0% [-8.1%, -2.2%] 9
All ❌✅ (primary) 1.8% [0.8%, 3.2%] 32

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.3% [0.0%, 6.0%] 114
Regressions ❌
(secondary)
1.3% [0.0%, 3.7%] 47
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.3% [0.0%, 6.0%] 114

Bootstrap: 666.919s -> 670.307s (0.51%)
Artifact size: 312.80 MiB -> 312.89 MiB (0.03%)

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (df8ac8f): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.3%, 2.2%] 67
Regressions ❌
(secondary)
0.8% [0.2%, 2.9%] 24
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [0.3%, 2.2%] 67

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [1.0%, 5.5%] 7
Regressions ❌
(secondary)
3.5% [2.0%, 7.4%] 36
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.8% [1.0%, 5.5%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [0.9%, 2.6%] 28
Regressions ❌
(secondary)
2.6% [1.9%, 3.4%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.8% [0.9%, 2.6%] 28

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.9% [0.0%, 5.0%] 111
Regressions ❌
(secondary)
0.8% [0.0%, 2.2%] 46
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.9% [0.0%, 5.0%] 111

Bootstrap: 669.21s -> 672.146s (0.44%)
Artifact size: 312.83 MiB -> 314.90 MiB (0.66%)

@RalfJung RalfJung deleted the mentioned-items branch March 21, 2024 13:23
@RalfJung
Copy link
Member Author

RalfJung commented Mar 21, 2024

These are expected, we are doing more work -- because we were previously skipping work that shouldn't have been skipped. Regressions affect almost exclusively incr benchmarks, where collector changes show up disproportionally -- there are still tons of items but all the queries are already cached, so there's little work per item outside the collector. A large fraction of the regression is from just reading and writing the list of "mentioned items" for each MIR body.

@rustbot label: +perf-regression-triaged

@therealprof
Copy link
Contributor

These are expected, we are doing more work -- because we were previously skipping work that shouldn't have been skipped. Regressions affect almost exclusively incr benchmarks, where collector changes show up disproportionally -- there are still tons of items but all the queries are already cached, so there's little work per item outside the collector. A large fraction of the regression is from just reading and writing the list of "mentioned items" for each MIR body.

Okay, the compiler is doing more work, fine. But why the massive regression in binary size?

@RalfJung
Copy link
Member Author

Eh, no idea. rlib files would get a bit bigger since they are now storing the "mentioned items" list. What kind of artifact is measured here, does it include Rust metadata?

We should be sending the exact same stuff to codegen as before. (And we'd see a perf regression if we did more codegen.)

@Kobzol
Copy link
Contributor

Kobzol commented Mar 22, 2024

This is all metadata (see https://perf.rust-lang.org/compare.html?start=47dd709bedda8127e8daec33327e0a9d0cdae845&end=df8ac8f1d74cffb96a93ae702d16e224f5b9ee8c&stat=size%3Acrate_metadata). If you uncheck libraries in the binary size metric, there are no changes ("binary size" for libraries is the size of the .rlib, including metadata).

That being said, even regressions in metadata size are unfortunate, especially of such magnitude (not sure if they can be avoided here though).

@RalfJung
Copy link
Member Author

Yeah I think it's the "mentioned items". This here is a benchmark from early development of this patch when all I did was add the "mentioned items" list, but not do anything with it. That's actually a bigger regression, I later did some changes that made the list shorter.

That being said, even regressions in metadata size are unfortunate, especially of such magnitude (not sure if they can be avoided here though).

We could strip the Span, which would save some space I guess? Or we could 'intern' function items in MIR so that we don't have to repeat them (once in the actual code and once in "mentioned items"), and instead referencing some index into a function item table or so -- but that would be a pretty drastic change.

@RalfJung
Copy link
Member Author

How does something like Ty<'tcx> even get represented on-disk? Does it repeat the entire type? I guess that makes this much less efficient than in-memory where they get deduplicated.

The rlib format could store all types in a table and then just index that table. But again, that's a big change.

@the8472
Copy link
Member

the8472 commented Mar 22, 2024

Note that #122785 already recovered some of the lib sizes - I was wondering why sizes were down. perf link

If those mentioned items weren't previously included in the rlibs does that mean they're also dead code? Maybe we can find a few more places where that trick can be applied.

@RalfJung
Copy link
Member Author

Even for live code we're now duplicating information. If a MIR body calls 100 functions, then it has 100 basic blocks for these calls, and 100 "mentioned items" with the types of the callees.

@RalfJung
Copy link
Member Author

I have opened an issue for this: #122936.

If we can make the on-disk representation of "mentioned items" smaller, that would also fix a significant part of the perf regression here. I just don't know if it can be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Const-eval errors in dead functions are optimization-dependent