Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit MIR inlining without #[inline] #109247

Merged
merged 3 commits into from
Apr 17, 2023

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Mar 17, 2023

I noticed that there are at least a handful of portable-simd functions that have no #[inline] but compile to an assign + return.

I locally benchmarked inlining thresholds between 0 and 50 in increments of 5, and 50 seems to be the best. Interesting. That didn't include check builds though, maybe perf will have something to say about that.

Perf has little useful to say about this. We generally regress all the check builds, as best as I can tell, due to a number of small codegen changes in a particular hot function in the compiler. Probably this is because we've nudged the inlining outcomes all over, and uses of #[inline(always)]/#[inline(never)] might need to be adjusted.

@saethlin saethlin added the A-mir-opt Area: MIR optimizations label Mar 17, 2023
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 17, 2023
@saethlin saethlin added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 17, 2023
@rust-log-analyzer

This comment has been minimized.

@saethlin saethlin changed the title Permit MIR inlining of very small functions without #[inline] Permit MIR inlining without #[inline] Mar 18, 2023
@saethlin saethlin force-pushed the inline-without-inline branch from 1412877 to ac6d3cb Compare March 18, 2023 02:57
@rust-log-analyzer

This comment has been minimized.

@saethlin saethlin force-pushed the inline-without-inline branch from ac6d3cb to f5f1cb1 Compare March 18, 2023 03:16
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 18, 2023
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Mar 18, 2023

⌛ Trying commit f5f1cb189a638607152e449c02d83dfce67f9d83 with merge 2dffcafbed22d039709534ae1a5078d9202122ff...

@bors
Copy link
Contributor

bors commented Mar 18, 2023

☀️ Try build successful - checks-actions
Build commit: 2dffcafbed22d039709534ae1a5078d9202122ff (2dffcafbed22d039709534ae1a5078d9202122ff)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2dffcafbed22d039709534ae1a5078d9202122ff): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.2% [0.4%, 25.7%] 75
Regressions ❌
(secondary)
1.1% [0.2%, 4.2%] 46
Improvements ✅
(primary)
-2.0% [-6.2%, -0.3%] 50
Improvements ✅
(secondary)
-8.1% [-43.8%, -0.3%] 27
All ❌✅ (primary) -0.1% [-6.2%, 25.7%] 125

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.8% [0.7%, 6.3%] 9
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.3% [-10.9%, -0.4%] 16
Improvements ✅
(secondary)
-4.5% [-7.3%, -2.4%] 4
All ❌✅ (primary) -1.8% [-10.9%, 6.3%] 25

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
13.3% [0.7%, 26.0%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.1% [-7.2%, -0.5%] 34
Improvements ✅
(secondary)
-10.4% [-46.7%, -2.4%] 21
All ❌✅ (primary) -2.2% [-7.2%, 26.0%] 36

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 18, 2023
@saethlin
Copy link
Member Author

The perf summary above looks more negative than the actual report page. The overall effect is dragged down a lot by the fact that this regresses a function in rustc_query_system::dep_graph. Going to look into this for ~1 day to see if I can easily prod codegen back to what it was before.

@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 18, 2023
@bors
Copy link
Contributor

bors commented Mar 18, 2023

⌛ Trying commit 9373b9301ae612f830885f227fadf2e77ad1b0e8 with merge 9615c8e881c7c418a670d586a2fe748104c298fe...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Mar 18, 2023

☀️ Try build successful - checks-actions
Build commit: 9615c8e881c7c418a670d586a2fe748104c298fe (9615c8e881c7c418a670d586a2fe748104c298fe)

@rust-timer

This comment has been minimized.

@nnethercote
Copy link
Contributor

This looks ok to me. There are some concern about check builds regressing instruction counts, but for changes like this that affect lots of code, it's equally/more important to look at cycles and wall-times. Those improve for check builds. (Select "Show non-relevant results" for the broad view.)

@oli-obk oli-obk self-assigned this Apr 11, 2023
// CHECK: sub <8 x i32> %[[A]], %[[B]]
// CHECK: sub <8 x i32> %[[B]], %[[A]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, what happened here? Is this just a reordering of the loads but not a reordering of the actual subtraction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me after making sure that this test doesn't get miscompiled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly what is going on here but the test isn't miscompiled. It's our aggressive abbreviations in this codegen test that make it look suspicious. Using -Zmir-opt-level=3 to simulate the effect of this PR because it lifts the inline attribute requirement: https://godbolt.org/z/EP7aTcz1x

Before:

define void @_ZN7example21short_integer_zip_map17hbd5b1e22189a0746E(ptr noalias nocapture noundef writeonly sret([8 x i32]) dereferenceable(32) %0, ptr noalias nocapture noundef readonly dereferenceable(32) %x, ptr noalias nocapture noundef readonly dereferenceable(32) %y) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !6 {
  %1 = load <8 x i32>, ptr %x, align 4, !dbg !11
  %2 = load <8 x i32>, ptr %y, align 4, !dbg !11
  %3 = sub <8 x i32> %1, %2, !dbg !12
  store <8 x i32> %3, ptr %0, align 4, !dbg !62, !alias.scope !63, !noalias !66
  ret void, !dbg !68
}

After:

define void @_ZN7example21short_integer_zip_map17hbd5b1e22189a0746E(ptr noalias nocapture noundef writeonly sret([8 x i32]) dereferenceable(32) %0, ptr noalias nocapture noundef readonly dereferenceable(32) %x, ptr noalias nocapture noundef readonly dereferenceable(32) %y) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !6 {
  %1 = load <8 x i32>, ptr %y, align 4, !dbg !11
  %2 = load <8 x i32>, ptr %x, align 4, !dbg !18
  %3 = sub <8 x i32> %2, %1, !dbg !19
  store <8 x i32> %3, ptr %0, align 4, !dbg !65, !alias.scope !66, !noalias !69
  ret void, !dbg !71
}

@oli-obk
Copy link
Contributor

oli-obk commented Apr 16, 2023

@bors r+

@bors
Copy link
Contributor

bors commented Apr 16, 2023

📌 Commit 3541775 has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 16, 2023
@bors
Copy link
Contributor

bors commented Apr 17, 2023

⌛ Testing commit 3541775 with merge 5546cb6...

@bors
Copy link
Contributor

bors commented Apr 17, 2023

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 5546cb6 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 17, 2023
@bors bors merged commit 5546cb6 into rust-lang:master Apr 17, 2023
@rustbot rustbot added this to the 1.71.0 milestone Apr 17, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5546cb6): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.2%, 25.6%] 135
Regressions ❌
(secondary)
1.4% [0.2%, 8.8%] 97
Improvements ✅
(primary)
-1.8% [-5.3%, -0.3%] 52
Improvements ✅
(secondary)
-9.6% [-43.2%, -1.2%] 21
All ❌✅ (primary) 0.2% [-5.3%, 25.6%] 187

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.3% [0.7%, 9.8%] 8
Regressions ❌
(secondary)
3.3% [2.2%, 4.1%] 4
Improvements ✅
(primary)
-4.1% [-6.0%, -1.9%] 11
Improvements ✅
(secondary)
-4.1% [-6.0%, -2.2%] 2
All ❌✅ (primary) -1.0% [-6.0%, 9.8%] 19

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
14.0% [1.2%, 26.9%] 2
Regressions ❌
(secondary)
3.0% [2.3%, 4.1%] 9
Improvements ✅
(primary)
-2.8% [-5.4%, -1.0%] 35
Improvements ✅
(secondary)
-10.9% [-45.7%, -1.0%] 19
All ❌✅ (primary) -1.9% [-5.4%, 26.9%] 37

@rylev
Copy link
Member

rylev commented Apr 19, 2023

@saethlin @compiler-errors looks like perf may be worse than in the original perf run before merge. Of particular concern is that the cargo incr-patched: println test case has regressed significantly (42% increase in wall time). Is it perhaps worth taking another look at the performance impact of this?

cc @nnethercote

@saethlin
Copy link
Member Author

saethlin commented Apr 19, 2023

That case was present before merge and has been discussed in this thread. Do you disagree with the statements made above?

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request May 17, 2023
… r=cjgillot

Don't expect normalization to succeed in elaborate_drops

Fixes rust-lang#110682

This was exposed through the changes in rust-lang#109247, which causes more things to be inlined. Inlining can happen before monomorphization, so we can't expect normalization to succeed. In the elaborate_drops analysis we currently have [this call](https://github.com/rust-lang/rust/blob/033aa092ab23ba14cdad27073c5e37ba0eddb428/compiler/rustc_mir_dataflow/src/elaborate_drops.rs#L278) to `normalize_erasing_regions`, which ICEs when normalization fails. The types are used to infer [whether the type needs a drop](https://github.com/rust-lang/rust/blob/033aa092ab23ba14cdad27073c5e37ba0eddb428/compiler/rustc_mir_dataflow/src/elaborate_drops.rs#L374), where `needs_drop` itself [uses `try_normalize_erasing_regions`](https://github.com/rust-lang/rust/blob/033aa092ab23ba14cdad27073c5e37ba0eddb428/compiler/rustc_middle/src/ty/util.rs#L1121).

~[`instance_mir`](https://doc.rust-lang.org/stable/nightly-rustc/rustc_middle/ty/context/struct.TyCtxt.html#method.instance_mir) isn't explicit about whether it expects the instances corresponding to the `InstanceDef`s to be monomorphized (though I think in all other contexts the function is used post-monomorphization), so the use of `instance_mir` in inlining doesn't necessarily seem wrong to me.~
thomcc pushed a commit to tcdi/postgrestd that referenced this pull request Jun 1, 2023
Add #[inline] to as_deref

While working on rust-lang/rust#109247 I found an `as_deref` call in the compiler that should have been inlined. This fixes the missing inlining (but doesn't address the perf issues I was chasing).

r? `@thomcc`
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 20, 2024
Add #[inline] to as_deref

While working on rust-lang/rust#109247 I found an `as_deref` call in the compiler that should have been inlined. This fixes the missing inlining (but doesn't address the perf issues I was chasing).

r? `@thomcc`
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 27, 2024
Add #[inline] to as_deref

While working on rust-lang/rust#109247 I found an `as_deref` call in the compiler that should have been inlined. This fixes the missing inlining (but doesn't address the perf issues I was chasing).

r? `@thomcc`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt Area: MIR optimizations merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.