Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable instcombine for mutable reborrows #105274

Merged
merged 1 commit into from
Feb 18, 2023

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Dec 4, 2022

instcombine used to contain this comment, which is no longer accurate because there it is fine to copy &mut _ in MIR:

// The dereferenced place must have type `&_`, so that we don't copy `&mut _`.

So let's try replacing that check with something much more permissive...

@rustbot
Copy link
Collaborator

rustbot commented Dec 4, 2022

r? @fee1-dead

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 4, 2022
@saethlin
Copy link
Member Author

saethlin commented Dec 4, 2022

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 4, 2022
@bors
Copy link
Contributor

bors commented Dec 4, 2022

⌛ Trying commit d966a3b69baf8b6a60918783f95170f552b79db2 with merge ad3c92c23998265b250e84072921bcacbe08907e...

@bors
Copy link
Contributor

bors commented Dec 5, 2022

☀️ Try build successful - checks-actions
Build commit: ad3c92c23998265b250e84072921bcacbe08907e (ad3c92c23998265b250e84072921bcacbe08907e)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (ad3c92c23998265b250e84072921bcacbe08907e): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.7%, 0.8%] 2
Regressions ❌
(secondary)
1.3% [1.3%, 1.3%] 2
Improvements ✅
(primary)
-0.5% [-0.7%, -0.3%] 7
Improvements ✅
(secondary)
-0.5% [-0.7%, -0.3%] 6
All ❌✅ (primary) -0.2% [-0.7%, 0.8%] 9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.4% [2.4%, 4.4%] 2
Improvements ✅
(primary)
-1.8% [-4.4%, -0.0%] 3
Improvements ✅
(secondary)
-1.5% [-2.2%, -1.2%] 3
All ❌✅ (primary) -1.8% [-4.4%, -0.0%] 3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.6% [2.6%, 2.6%] 1
Improvements ✅
(primary)
-1.4% [-1.6%, -0.8%] 12
Improvements ✅
(secondary)
-2.8% [-3.5%, -1.6%] 15
All ❌✅ (primary) -1.4% [-1.6%, -0.8%] 12

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Dec 5, 2022
@saethlin
Copy link
Member Author

saethlin commented Dec 5, 2022

The regressions are in externs, which is noise, and in opt builds of ripgrep and cargo. I've confirmed locally that this PR definitely does alter the optimized codegen for ripgrep. But beyond that simple observation, I don't really have a way to quantify if that is good or not. Perhaps we are enabling more optimizations?

@saethlin saethlin marked this pull request as ready for review December 5, 2022 17:59
@rustbot
Copy link
Collaborator

rustbot commented Dec 5, 2022

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@fee1-dead
Copy link
Member

I don't really have a way to quantify if that is good or not. Perhaps we are enabling more optimizations?

Hmm. Can we confirm that this generates better assembly? Otherwise LLVM might be doing unnecessary work.

@fee1-dead
Copy link
Member

Let's get a second opinion on this. I would approve it if it did not regress, but now I don't know if it is the most optimal thing to do..

r? compiler

@rustbot rustbot assigned wesleywiser and unassigned fee1-dead Dec 8, 2022
@saethlin
Copy link
Member Author

saethlin commented Dec 8, 2022

To your previous comment, I have looked through things as best I can.

A diff of nm on a release build of ripgrep before and after this PR indicates that there has been some inlining changes after LLVM. But the changes are primarily in various drop_in_place, with a few changes elsewhere. The only symbol that looked perf-related was a few things related to GlobSet. I ran the benchmarks for the sub-crate globset before and after, and the benchmarks are just too noisy to conclude anything 🤷 There could easily be a perf improvement of 1% in there, I just wouldn't know.

I also looked around the ecosystem at a few other crates. In a few cases I found microbenchmarks whose runtimes seem to have changed with this PR, but if I inspect the assembly for the benchmark, I see no changes at all. I wouldn't be surprised if these changes are wall time changes due to code layout shifts in the criterion or libtest runtime, perhaps perturbing the alignment of the benchmark loop.

The rustc-perf runtime benchmarks are exactly the same before and after this PR.

@bjorn3
Copy link
Member

bjorn3 commented Dec 11, 2022

Is this compatible with stacked borrows? AFAIK reborrows have a semantic meaning.

@saethlin
Copy link
Member Author

That sounds like a good question for t-opsem 😉 (I am resisting the urge to put together an informal proof that this is valid, there is much else I would like to do)

The fact that the aliasing model cares about this doesn't necessarily mean we can't remove it in a MIR optimization. These optimizations can and do rely on the input program not executing UB, and are only obligated to not add UB.

I would be surprised if this optimization were legal for non-mutable reborrows but not mutable reborrows. Especially considering the comment this PR removes.

@saethlin
Copy link
Member Author

saethlin commented Jan 7, 2023

This change doesn't delete any MIR statements, so it kind of makes sense for it to be approximately perf-neutral on balance. But in combination with DestinationPropagation, a benefit should be visible. So it would make sense for this PR to wait for #105577.

@rustbot label +S-blocked

@rustbot rustbot added the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label Jan 7, 2023
@saethlin
Copy link
Member Author

Rebased away merge conflict, changing reviewer to Oli because you keep r?'ing yourself on my other MIR opt PRs.

r? @oli-obk

@rustbot
Copy link
Collaborator

rustbot commented Jan 16, 2023

Failed to set assignee to 'ing: invalid assignee

Note: Only org members, users with write permissions, or people who have commented on the PR may be assigned.

@bors
Copy link
Contributor

bors commented Feb 16, 2023

⌛ Trying commit 1409cb5 with merge c5bb1d862806b46d162d2c26fc57fc5f2ef20fc8...

@bors
Copy link
Contributor

bors commented Feb 16, 2023

☀️ Try build successful - checks-actions
Build commit: c5bb1d862806b46d162d2c26fc57fc5f2ef20fc8 (c5bb1d862806b46d162d2c26fc57fc5f2ef20fc8)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c5bb1d862806b46d162d2c26fc57fc5f2ef20fc8): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.2%, 3.2%] 16
Regressions ❌
(secondary)
0.4% [0.2%, 0.7%] 11
Improvements ✅
(primary)
-0.7% [-2.4%, -0.2%] 50
Improvements ✅
(secondary)
-0.8% [-1.7%, -0.3%] 23
All ❌✅ (primary) -0.4% [-2.4%, 3.2%] 66

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.2% [0.1%, 7.3%] 4
Regressions ❌
(secondary)
3.4% [1.2%, 5.6%] 2
Improvements ✅
(primary)
-2.6% [-7.9%, -0.9%] 8
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.7% [-7.9%, 7.3%] 12

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.0% [2.0%, 2.0%] 1
Regressions ❌
(secondary)
3.6% [3.6%, 3.6%] 2
Improvements ✅
(primary)
-2.1% [-2.1%, -2.1%] 1
Improvements ✅
(secondary)
-2.2% [-2.2%, -2.2%] 1
All ❌✅ (primary) -0.1% [-2.1%, 2.0%] 2

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 16, 2023
@saethlin
Copy link
Member Author

Hmm, that's a lot more LLVM work in cranelift-codegen and some regressions in check builds. I think the regressions in opt-full builds are due to extra MIR inlining.

Probably the LLVM inlining got bumped as well. I'll study the cachegrind diffs and MIR diffs about 7 hours from now. But I think there may be an argument for merging as-is, it seems unlikely that the big regression is actionable.

@saethlin
Copy link
Member Author

I cannot find any MIR inlining differences in cranelift-codegen. Though several reports of inlining now report that they are being inlined at different scopes, so something different has happened to the caller context in a few places. I know that the MIR inlining in the standard library has changed, so I suspect the regression is related to that.

cachegrind diffs for the check regressions point at a smattering of functions. I've looked into types_may_unify and hash_stable for Span. It looks like the code in types_may_unify is rearranged, but I have no idea why it is slower. I can't find any difference in hash_stable for Span. In both cases I cannot use any traditional profiling tools, so I propose we just accept those regressions because on balance this PR is an improvement.

@saethlin
Copy link
Member Author

saethlin commented Feb 16, 2023

@cjgillot With this PR, there are some copies of &mut that look unnecessary to me. I feel like CopyProp is supposed to delete these. Can you explain why it doesn't? For example, I feel like _3 should become _1:

fn str::<impl at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:136:1: 136:9>::make_ascii_lowercase(_1: &mut str) -> () {
    debug self => _1;                    // in scope 0 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2444:33: 2444:42
    let mut _0: ();                      // return place in scope 0 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2444:44: 2444:44
    let mut _2: &mut [u8];               // in scope 0 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2446:27: 2446:46
    let mut _3: &mut str;                // in scope 0 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2446:27: 2446:46
    scope 1 {
        debug me => _2;                  // in scope 1 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2446:13: 2446:15
    }
    scope 2 {
        scope 3 (inlined str::<impl str>::as_bytes_mut) { // at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2446:32: 2446:46
            debug self => _3;            // in scope 3 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:369:32: 369:41
            let mut _4: *mut [u8];       // in scope 3 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:374:24: 374:55
            let mut _5: *mut str;        // in scope 3 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:374:25: 374:41
            scope 4 {
            }
        }
    }

    bb0: {
        StorageLive(_3);                 // scope 2 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2446:27: 2446:46
        _3 = _1;                         // scope 2 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2446:27: 2446:46
        StorageLive(_4);                 // scope 4 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:374:24: 374:55
        StorageLive(_5);                 // scope 4 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:374:25: 374:41
        _5 = &raw mut (*_3);             // scope 4 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:374:25: 374:29
        _4 = move _5 as *mut [u8] (PtrToPtr); // scope 4 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:374:24: 374:55
        StorageDead(_5);                 // scope 4 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:374:54: 374:55
        _2 = &mut (*_4);                 // scope 4 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:374:18: 374:55
        StorageDead(_4);                 // scope 3 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:375:5: 375:6
        StorageDead(_3);                 // scope 2 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2446:45: 2446:46
        _0 = slice::ascii::<impl [u8]>::make_ascii_lowercase(_2) -> bb1; // scope 1 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2447:9: 2447:34
                                         // mir::Constant
                                         // + span: /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2447:12: 2447:32
                                         // + literal: Const { ty: for<'a> fn(&'a mut [u8]) {slice::ascii::<impl [u8]>::make_ascii_lowercase}, val: Value(<ZST>) }
    }

    bb1: {
        return;                          // scope 0 at /home/ben/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/src/rust/library/core/src/str/mod.rs:2448:6: 2448:6
    }
}

@cjgillot
Copy link
Contributor

@saethlin which function's MIR are you showing? I can't read the span nor its name, truncated by the copy-paste.

@saethlin
Copy link
Member Author

Wow yeah I really truncated that didn't I. I updated the comment, and the function is str::make_ascii_lowercase.

@cjgillot
Copy link
Contributor

It's a shortcoming of the SSA analysis. Based on a visitor, it checks for PlaceContext::MutatingUse(_). But in the default implementation, a &raw mut * corresponds to a PlaceContext::MutatingUse(MutatingUseContext::Projection) although it does not mutate the local. d8b8371 should solve this.

@saethlin
Copy link
Member Author

That makes sense. I've always though the PlaceContext didn't categorize things correctly for MIR opts.

@cjgillot
Copy link
Contributor

I cannot find any MIR inlining differences in cranelift-codegen. Though several reports of inlining now report that they are being inlined at different scopes, so something different has happened to the caller context in a few places. I know that the MIR inlining in the standard library has changed, so I suspect the regression is related to that.

cachegrind diffs for the check regressions point at a smattering of functions. I've looked into types_may_unify and hash_stable for Span. It looks like the code in types_may_unify is rearranged, but I have no idea why it is slower. I can't find any difference in hash_stable for Span. In both cases I cannot use any traditional profiling tools, so I propose we just accept those regressions because on balance this PR is an improvement.

I agree with merging this PR as is. In addition, we have a gain up to 3% in terms of binary size and metadata size. I'll propose some improvement to CopyProp in a separate PR.

@bors r+

@bors
Copy link
Contributor

bors commented Feb 17, 2023

📌 Commit 1409cb5 has been approved by cjgillot

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 17, 2023
@cjgillot cjgillot added the perf-regression-triaged The performance regression has been triaged. label Feb 17, 2023
@bors
Copy link
Contributor

bors commented Feb 17, 2023

⌛ Testing commit 1409cb5 with merge 231bcd1...

@bors
Copy link
Contributor

bors commented Feb 18, 2023

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing 231bcd1 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 18, 2023
@bors bors merged commit 231bcd1 into rust-lang:master Feb 18, 2023
@rustbot rustbot added this to the 1.69.0 milestone Feb 18, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (231bcd1): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.2%, 3.1%] 16
Regressions ❌
(secondary)
0.4% [0.2%, 0.5%] 11
Improvements ✅
(primary)
-0.7% [-2.3%, -0.3%] 32
Improvements ✅
(secondary)
-1.0% [-1.7%, -0.3%] 17
All ❌✅ (primary) -0.3% [-2.3%, 3.1%] 48

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.9% [0.1%, 5.5%] 6
Regressions ❌
(secondary)
4.2% [4.2%, 4.2%] 1
Improvements ✅
(primary)
-3.1% [-8.2%, -1.5%] 6
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.1% [-8.2%, 5.5%] 12

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.0% [1.6%, 2.4%] 2
Regressions ❌
(secondary)
5.9% [5.6%, 6.6%] 4
Improvements ✅
(primary)
-1.6% [-2.2%, -1.0%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.2% [-2.2%, 2.4%] 4

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 9, 2023
Do not consider `&mut *x` as mutating `x` in `CopyProp`

This PR removes an unfortunate overly cautious case from the current implementation.

Found by rust-lang#105274 cc `@saethlin`
saethlin pushed a commit to saethlin/miri that referenced this pull request Mar 11, 2023
Do not consider `&mut *x` as mutating `x` in `CopyProp`

This PR removes an unfortunate overly cautious case from the current implementation.

Found by rust-lang/rust#105274 cc `@saethlin`
@saethlin saethlin deleted the instcombine-mut-ref branch March 15, 2023 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt Area: MIR optimizations merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants