rustc_codegen_ssa: Better code generation for niche discriminants. #102872

mikebenfield · 2022-10-10T07:06:48Z

In some cases we can avoid arithmetic before checking whether a niche is a tag.

Also rename some identifiers around niches.

This is relevant to #101872

rust-highfive · 2022-10-10T07:06:51Z

r? @nagisa

(rust-highfive has picked a reviewer for you, use r? to override)

rustbot · 2022-10-10T07:06:52Z

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

mikebenfield · 2022-10-10T07:14:58Z

In my performance testing this improves runtime of every benchmark I tried from rustc-perf. Usually around 1% or 2%, sometimes as much as 5%. However I often see discrepancies between the perf runs here and my local runs, so who knows.

I felt some of the names around niches were confusing so I renamed some things. If it's too obnoxious of me to include the renames here I can change them back.

RalfJung · 2022-10-10T08:51:27Z

compiler/rustc_const_eval/src/interpret/operand.rs

@@ -650,7 +650,7 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
        // declared list of variants -- they can differ with explicitly assigned discriminants.
        // We use "tag" to refer to how the discriminant is encoded in memory, which can be either
        // straight-forward (`TagEncoding::Direct`) or with a niche (`TagEncoding::Niche`).
-        let (tag_scalar_layout, tag_encoding, tag_field) = match op.layout.variants {
+        let (tag_scalar_layout, encoding, field) = match op.layout.variants {


Why did you rename these? These local variable names were carefully chosen. Is 'tag' no longer accurate?

RalfJung · 2022-10-10T08:52:20Z

compiler/rustc_const_eval/src/interpret/operand.rs

-            Variants::Multiple { tag, ref tag_encoding, tag_field, .. } => {
-                (tag, tag_encoding, tag_field)
+            Variants::Multiple { scalar, ref encoding, field, .. } => {
+                (scalar, encoding, field)


'scalar', in the interpreter, usually refers to an interpret::Scalar. Please avoid using that name for things that have another type. This here describes the type/ABI of a scalar, but it is not a scalar value.

Alright. But given that objects of this type shouldn't be referred to as scalars... maybe Scalar is not a good name for the type?

I consider its full name to be abi::Scalar.

My understanding is that the name makes more sense outside of the interpreter, but given that the interpreter has interpret::Scalar we have a somewhat unfortunate conflict.

RalfJung · 2022-10-10T08:53:20Z

compiler/rustc_const_eval/src/interpret/validity.rs

-            Variants::Multiple { tag_field, .. } => {
-                if tag_field == field {
+            Variants::Multiple { field, .. } => {
+                if field == field {


Something went very wrong here.

RalfJung · 2022-10-10T08:55:36Z

compiler/rustc_target/src/abi/mod.rs

-        tag_field: usize,
+        encoding: TagEncoding,
+        scalar: Scalar,
+        field: usize,


I could live with removing the tag_ prefix (but it seems you did a global search-and-replace here and that changed the meaning of some code, see my previous comment), but scalar: Scalar is worse than before IMO. This is the scalar ABI of the tag.

You also removed the term 'tag' from the comment, which is bad IMO. We used to have a lot of confusion between discriminant and tag, until I started systematically putting these terms everywhere. Please don't revert us back to the confusing state.

bjorn3 · 2022-10-10T08:57:30Z

compiler/rustc_codegen_ssa/src/mir/place.rs

+                let hi_cmp = bx.icmp(IntPredicate::IntULE, value, hi);
+                bx.assume(lo_cmp);
+                bx.assume(hi_cmp);
+                value


Did you forget to apply most of this extra logic to cg_clif?

I guess cg_clif refers to cranelift. Yeah I made no changes to code generation for cranelift.

RalfJung · 2022-10-10T08:58:50Z

I can't comment on the codegen change, but the rename overall is a net negative IMO. I am open to suggestions for how to improve our current names, but we need a term for 'the discriminant in its encoded form in memory'. That thing used to be called 'tag', and this PR leaves that term in some places (TagEncoding) but removes it in many others (in particular in many of the comments) without even introducing an alternative term. If you want to replace 'tag' by something else, fine, but we need some way to refer to this, and this PR seems to have the goal of just eliminating the term without a replacement and without eliminating the underlying concept -- leaving us with a concept without a name, which is bad.

mikebenfield · 2022-10-10T15:47:37Z

Alright I'll undo the name changes.

FWIW the main problem for me is that "niche" is used in vague and confusing ways, in ways that seem to overlap or conflict with "tag" and, to some degree, "discriminant." Is a "niche value" whatever value is in the field that has been designated as a niche, or are the niche values only the ones used to indicate discriminants or, almost the opposite, are the niche values the ones that are actually meaningful in the underlying field?

Another minor issue is that just plain tag is used a lot. The field of Scalar type in Variants::Multiple is just called tag; every time I write code with Variants::Multiple I have to remind myself what the heck that tag field is, along with a lot of other names in codegen_get_discr and related functions.

But it doesn't really matter. I'll push a new commit in the near future to undo the name stuff.

RalfJung · 2022-10-10T15:56:01Z

As I said I am open to renaming 'tag' if you have some good ideas. But I don't like just removing the name without replacement -- concepts like that need names. I'm not saying our current terms are great, they are not, but even bad terms are better than no terms.

The way I think about it: "niche" values of a type are bit patterns that are not valid representations for that type, and can be used to store the tag which can then be mapped to the discriminant. (In that sense they are not even values...)

krdln · 2022-10-10T21:14:38Z

@mikebenfield Looks like we've both noticed the "needless sub before cmp" and implemented basically the same idea, but you pushed your PR first 😆. I actually abandoned my branch, as perf wins I was seeing were very minuscule (despite asm looking quite nice in some artificial tests (see this gist)). I've ran your branch through this gist (see -mike file) and interestingly, it regresses in some cases (the bar / baz etc. functions, which are basically matches! on a single variant). The "full-match" (or calling std::mem::get_discriminant) is a very nice win though (I have a couple of needles movzxs there)! Anyway, decided to undust and push my code as a draft.

So, on my branch there are some nice assembly improvements, but almost negligible perf wins, while you have some assembly regressions (and couple improvements) but significant perf wins. This is interesting! Perhaps the case in which I have slight regression (if that's even a regression), and you have an improvement – a "full-match on u8-based enum" – is dominating? I hope we can analyze it further and come up with "best-of-both-worlds" approach.

From quick look at your code, the differences in implementation I've seen are:

different placement of cast,
I didn't bother handling niched before untagged case, as I haven't seen such in the wild,
I added an additional assume to tell LLVM about the hole in niche_variants.
I see you have some additional assumes.

mikebenfield · 2022-10-10T21:47:59Z

Oh that's funny we both had the same idea. I didn't check on cases like your bar and baz; I'm surprised my code regresses in that case. I'll investigate what goes wrong. Are you sure your assume is always true? niche_variants doesn't always have a hole.

krdln · 2022-10-10T22:00:09Z

@mikebenfield

Are you sure your assume is always true? niche_variants doesn't always have a hole.

You're right. I did a bit of mental shortcut – what I've meant is that niche_variants won't ever contain untagged_variant (if untagged_variant is outside niche_variants, this assumption while less useful, will still be valid. I even tried to gate the assume to "true holes only", but it's not worth it).

mikebenfield · 2022-10-10T23:13:17Z

OK, here appears to be the reason my generated code regresses in that case.

Given this LLVM IR:

define i1 @myfunction(i8 %0) {
start:
  %1 = icmp ule i8 %0, 49
  %2 = zext i8 %0 to i64
  %3 = add i64 %2, -46
  %_2 = select i1 %1, i64 %3, i64 0
  %4 = icmp eq i64 %_2, 1
  ret i1 %4
}

I run an InstCombinePass on it: (opt --instcombine -S nope.ll)

define i1 @myfunction(i8 %0) {
start:
  %1 = icmp eq i8 %0, 47
  ret i1 %1
}

Great! That works as expected. But instead change the function to take an i16, and truncate it:

define i1 @myfunction(i16 %0) {
start:
  %z = trunc i16 %0 to i8
  %1 = icmp ule i8 %z, 49
  %2 = zext i8 %z to i64
  %3 = add i64 %2, -46
  %_2 = select i1 %1, i64 %3, i64 0
  %4 = icmp eq i64 %_2, 1
  ret i1 %4
}

and the InstCombinePass produces this:

define i1 @myfunction(i16 %0) {
start:
  %z = trunc i16 %0 to i8
  %1 = icmp ult i8 %z, 50
  %z.mask = and i16 %0, 255
  %2 = icmp eq i16 %z.mask, 47
  %3 = and i1 %1, %2
  ret i1 %3
}

Which is... not so great. Maybe I'll file a bug with LLVM.

mikebenfield · 2022-10-10T23:27:39Z

I filed this issue.

mikebenfield · 2022-10-11T00:01:20Z

a "full-match on u8-based enum" – is dominating?

Yeah most of my testing was on matches that are full or nearly full, generally with enough cases so that a jump table is generated.

mikebenfield · 2022-10-12T04:12:15Z

Cleaned up codegen_get_discr a little, wrote better comments, and made it more careful about when it can move all the arithmetic to after the cast.

Add test src/test/ui/enum-discriminant/get_discr.rs to make sure some of those cases work correctly.

nagisa · 2022-10-18T11:46:07Z

Before I take a look at this, quantifying the wins is probably a good way to get an objective answer to performance discussions above.

@bors try @rust-timer queue

rust-timer · 2022-10-18T11:46:09Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-11-09T02:02:22Z

💔 Test failed - checks-actions

mikebenfield · 2022-11-10T18:38:07Z

@nagisa I assume that failure is some network/infrastructure hiccup? Otherwise I'm not sure what's going on.

lqd · 2022-11-10T18:47:11Z

yes

@bors retry network failure in aarch64-gnu builder: "curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to ci-mirrors.rust-lang.org:443"

bors · 2022-11-10T19:29:50Z

⌛ Testing commit 2adb8178ce9b0d6a8a384d5e5ee3f2ca66c6deeb with merge 661023d5027c336e0aabe15e136c0f38d50146f4...

bors · 2022-11-10T19:49:24Z

💔 Test failed - checks-actions

In some cases we can avoid arithmetic before checking whether a niche represents an untagged variant. This is relevant to rust-lang#101872

mikebenfield · 2022-11-11T05:56:46Z

Alright, that failed because the LLVM IR I test against is not what's generated on 32 bit systems. Duh. I added // only-x86_64 to the test; someone let me know if that is not an appropriate fix.

nagisa · 2022-11-11T11:48:35Z

It is better to have a #[no_core] test that specifies --target=x86_64-unknown-linux-gnu or somesuch so that the test can run locally on developers’ non-x86-64 machines, but this is okay too.

@bors r+

bors · 2022-11-11T11:48:36Z

📌 Commit 51918dc has been approved by nagisa

It is now in the queue for this repository.

bors · 2022-11-11T13:50:35Z

⌛ Testing commit 51918dc with merge 742d3f0...

bors · 2022-11-11T16:59:56Z

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 742d3f0 to master...

rust-timer · 2022-11-11T18:20:21Z

Finished benchmarking commit (742d3f0): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.5%, 0.6%]	2
Regressions ❌ (secondary)	2.1%	[2.1%, 2.1%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.3%	[-2.2%, -0.3%]	4
All ❌✅ (primary)	0.5%	[0.5%, 0.6%]	2

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.9%	[1.9%, 1.9%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.8%	[-1.8%, -1.7%]	2
All ❌✅ (primary)	-	-	0

nnethercote · 2022-11-13T22:58:06Z

This must have been right on the edge of the regression/no-regression categorization. Not much to worry about here.

@rustbot label: +perf-regression-triaged

rust-highfive assigned nagisa Oct 10, 2022

rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Oct 10, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 10, 2022

This comment has been minimized.

Sign in to view

mikebenfield mentioned this pull request Oct 10, 2022

Performance regression with niche optimization #101872

Open

RalfJung reviewed Oct 10, 2022

View reviewed changes

bjorn3 reviewed Oct 10, 2022

View reviewed changes

mikebenfield force-pushed the better-get-discr branch from 85e4222 to 9a7f3f9 Compare October 10, 2022 17:33

krdln mentioned this pull request Oct 10, 2022

Simplify codegen for niche-encoded enums in simple cases #102901

Closed

3 tasks

mikebenfield force-pushed the better-get-discr branch from 9a7f3f9 to 6e405a9 Compare October 12, 2022 04:09

mikebenfield force-pushed the better-get-discr branch from 6e405a9 to fa51934 Compare October 13, 2022 22:21

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Nov 9, 2022

This comment has been minimized.

Sign in to view

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 10, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Nov 10, 2022

This comment has been minimized.

Sign in to view

rustc_codegen_ssa: Better code generation for niche discriminants.

51918dc

In some cases we can avoid arithmetic before checking whether a niche represents an untagged variant. This is relevant to rust-lang#101872

mikebenfield force-pushed the better-get-discr branch from 2adb817 to 51918dc Compare November 11, 2022 05:55

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 11, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 11, 2022

bors merged commit 742d3f0 into rust-lang:master Nov 11, 2022

rustbot added this to the 1.67.0 milestone Nov 11, 2022

anp mentioned this pull request Nov 17, 2022

Possible codegen regression when matching against nested enums #104519

Closed

nikic mentioned this pull request Apr 11, 2023

Do not attempt to commute comparison and cast to codegen discriminants #110197

Merged

rustc_codegen_ssa: Better code generation for niche discriminants. #102872

rustc_codegen_ssa: Better code generation for niche discriminants. #102872

Uh oh!

Conversation

mikebenfield commented Oct 10, 2022

Uh oh!

rust-highfive commented Oct 10, 2022

Uh oh!

rustbot commented Oct 10, 2022

Uh oh!

mikebenfield commented Oct 10, 2022

Uh oh!

This comment has been minimized.

RalfJung Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikebenfield Oct 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebenfield commented Oct 10, 2022

Uh oh!

RalfJung commented Oct 10, 2022

Uh oh!

krdln commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebenfield commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krdln commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebenfield commented Oct 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikebenfield commented Oct 10, 2022

Uh oh!

mikebenfield commented Oct 11, 2022

Uh oh!

mikebenfield commented Oct 12, 2022

Uh oh!

nagisa commented Oct 18, 2022

Uh oh!

rust-timer commented Oct 18, 2022

Uh oh!

bors commented Nov 9, 2022

Uh oh!

This comment has been minimized.

mikebenfield commented Nov 10, 2022

Uh oh!

lqd commented Nov 10, 2022

Uh oh!

bors commented Nov 10, 2022

Uh oh!

bors commented Nov 10, 2022

Uh oh!

This comment has been minimized.

mikebenfield commented Nov 11, 2022

Uh oh!

nagisa commented Nov 11, 2022

Uh oh!

RalfJung Oct 10, 2022 •

edited

Loading

RalfJung Oct 10, 2022 •

edited

Loading

mikebenfield Oct 11, 2022 •

edited

Loading

RalfJung commented Oct 10, 2022 •

edited

Loading

krdln commented Oct 10, 2022 •

edited

Loading

mikebenfield commented Oct 10, 2022 •

edited

Loading

krdln commented Oct 10, 2022 •

edited

Loading

mikebenfield commented Oct 10, 2022 •

edited

Loading