Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc_codegen_ssa: Better code generation for niche discriminants. #102872

Merged
merged 1 commit into from
Nov 11, 2022

Conversation

mikebenfield
Copy link
Contributor

In some cases we can avoid arithmetic before checking whether a niche is a tag.

Also rename some identifiers around niches.

This is relevant to #101872

@rust-highfive
Copy link
Collaborator

r? @nagisa

(rust-highfive has picked a reviewer for you, use r? to override)

@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Oct 10, 2022
@rustbot
Copy link
Collaborator

rustbot commented Oct 10, 2022

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 10, 2022
@mikebenfield
Copy link
Contributor Author

In my performance testing this improves runtime of every benchmark I tried from rustc-perf. Usually around 1% or 2%, sometimes as much as 5%. However I often see discrepancies between the perf runs here and my local runs, so who knows.

I felt some of the names around niches were confusing so I renamed some things. If it's too obnoxious of me to include the renames here I can change them back.

@rust-log-analyzer

This comment has been minimized.

@@ -650,7 +650,7 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
// declared list of variants -- they can differ with explicitly assigned discriminants.
// We use "tag" to refer to how the discriminant is encoded in memory, which can be either
// straight-forward (`TagEncoding::Direct`) or with a niche (`TagEncoding::Niche`).
let (tag_scalar_layout, tag_encoding, tag_field) = match op.layout.variants {
let (tag_scalar_layout, encoding, field) = match op.layout.variants {
Copy link
Member

@RalfJung RalfJung Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you rename these? These local variable names were carefully chosen. Is 'tag' no longer accurate?

Variants::Multiple { tag, ref tag_encoding, tag_field, .. } => {
(tag, tag_encoding, tag_field)
Variants::Multiple { scalar, ref encoding, field, .. } => {
(scalar, encoding, field)
Copy link
Member

@RalfJung RalfJung Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'scalar', in the interpreter, usually refers to an interpret::Scalar. Please avoid using that name for things that have another type. This here describes the type/ABI of a scalar, but it is not a scalar value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. But given that objects of this type shouldn't be referred to as scalars... maybe Scalar is not a good name for the type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider its full name to be abi::Scalar.

My understanding is that the name makes more sense outside of the interpreter, but given that the interpreter has interpret::Scalar we have a somewhat unfortunate conflict.

Variants::Multiple { tag_field, .. } => {
if tag_field == field {
Variants::Multiple { field, .. } => {
if field == field {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something went very wrong here.

tag_field: usize,
encoding: TagEncoding,
scalar: Scalar,
field: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could live with removing the tag_ prefix (but it seems you did a global search-and-replace here and that changed the meaning of some code, see my previous comment), but scalar: Scalar is worse than before IMO. This is the scalar ABI of the tag.

You also removed the term 'tag' from the comment, which is bad IMO. We used to have a lot of confusion between discriminant and tag, until I started systematically putting these terms everywhere. Please don't revert us back to the confusing state.

let hi_cmp = bx.icmp(IntPredicate::IntULE, value, hi);
bx.assume(lo_cmp);
bx.assume(hi_cmp);
value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to apply most of this extra logic to cg_clif?

Copy link
Contributor Author

@mikebenfield mikebenfield Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess cg_clif refers to cranelift. Yeah I made no changes to code generation for cranelift.

@RalfJung
Copy link
Member

RalfJung commented Oct 10, 2022

I can't comment on the codegen change, but the rename overall is a net negative IMO. I am open to suggestions for how to improve our current names, but we need a term for 'the discriminant in its encoded form in memory'. That thing used to be called 'tag', and this PR leaves that term in some places (TagEncoding) but removes it in many others (in particular in many of the comments) without even introducing an alternative term. If you want to replace 'tag' by something else, fine, but we need some way to refer to this, and this PR seems to have the goal of just eliminating the term without a replacement and without eliminating the underlying concept -- leaving us with a concept without a name, which is bad.

@mikebenfield
Copy link
Contributor Author

Alright I'll undo the name changes.

FWIW the main problem for me is that "niche" is used in vague and confusing ways, in ways that seem to overlap or conflict with "tag" and, to some degree, "discriminant." Is a "niche value" whatever value is in the field that has been designated as a niche, or are the niche values only the ones used to indicate discriminants or, almost the opposite, are the niche values the ones that are actually meaningful in the underlying field?

Another minor issue is that just plain tag is used a lot. The field of Scalar type in Variants::Multiple is just called tag; every time I write code with Variants::Multiple I have to remind myself what the heck that tag field is, along with a lot of other names in codegen_get_discr and related functions.

But it doesn't really matter. I'll push a new commit in the near future to undo the name stuff.

@RalfJung
Copy link
Member

As I said I am open to renaming 'tag' if you have some good ideas. But I don't like just removing the name without replacement -- concepts like that need names. I'm not saying our current terms are great, they are not, but even bad terms are better than no terms.

The way I think about it: "niche" values of a type are bit patterns that are not valid representations for that type, and can be used to store the tag which can then be mapped to the discriminant. (In that sense they are not even values...)

@krdln
Copy link
Contributor

krdln commented Oct 10, 2022

@mikebenfield Looks like we've both noticed the "needless sub before cmp" and implemented basically the same idea, but you pushed your PR first 😆. I actually abandoned my branch, as perf wins I was seeing were very minuscule (despite asm looking quite nice in some artificial tests (see this gist)). I've ran your branch through this gist (see -mike file) and interestingly, it regresses in some cases (the bar / baz etc. functions, which are basically matches! on a single variant). The "full-match" (or calling std::mem::get_discriminant) is a very nice win though (I have a couple of needles movzxs there)! Anyway, decided to undust and push my code as a draft.

So, on my branch there are some nice assembly improvements, but almost negligible perf wins, while you have some assembly regressions (and couple improvements) but significant perf wins. This is interesting! Perhaps the case in which I have slight regression (if that's even a regression), and you have an improvement – a "full-match on u8-based enum" – is dominating? I hope we can analyze it further and come up with "best-of-both-worlds" approach.

From quick look at your code, the differences in implementation I've seen are:

  • different placement of cast,
  • I didn't bother handling niched before untagged case, as I haven't seen such in the wild,
  • I added an additional assume to tell LLVM about the hole in niche_variants.
  • I see you have some additional assumes.

@mikebenfield
Copy link
Contributor Author

mikebenfield commented Oct 10, 2022

Oh that's funny we both had the same idea. I didn't check on cases like your bar and baz; I'm surprised my code regresses in that case. I'll investigate what goes wrong. Are you sure your assume is always true? niche_variants doesn't always have a hole.

@krdln
Copy link
Contributor

krdln commented Oct 10, 2022

@mikebenfield

Are you sure your assume is always true? niche_variants doesn't always have a hole.

You're right. I did a bit of mental shortcut – what I've meant is that niche_variants won't ever contain untagged_variant (if untagged_variant is outside niche_variants, this assumption while less useful, will still be valid. I even tried to gate the assume to "true holes only", but it's not worth it).

@mikebenfield
Copy link
Contributor Author

mikebenfield commented Oct 10, 2022

OK, here appears to be the reason my generated code regresses in that case.

Given this LLVM IR:

define i1 @myfunction(i8 %0) {
start:
  %1 = icmp ule i8 %0, 49
  %2 = zext i8 %0 to i64
  %3 = add i64 %2, -46
  %_2 = select i1 %1, i64 %3, i64 0
  %4 = icmp eq i64 %_2, 1
  ret i1 %4
}

I run an InstCombinePass on it: (opt --instcombine -S nope.ll)

define i1 @myfunction(i8 %0) {
start:
  %1 = icmp eq i8 %0, 47
  ret i1 %1
}

Great! That works as expected. But instead change the function to take an i16, and truncate it:

define i1 @myfunction(i16 %0) {
start:
  %z = trunc i16 %0 to i8
  %1 = icmp ule i8 %z, 49
  %2 = zext i8 %z to i64
  %3 = add i64 %2, -46
  %_2 = select i1 %1, i64 %3, i64 0
  %4 = icmp eq i64 %_2, 1
  ret i1 %4
}

and the InstCombinePass produces this:

define i1 @myfunction(i16 %0) {
start:
  %z = trunc i16 %0 to i8
  %1 = icmp ult i8 %z, 50
  %z.mask = and i16 %0, 255
  %2 = icmp eq i16 %z.mask, 47
  %3 = and i1 %1, %2
  ret i1 %3
}

Which is... not so great. Maybe I'll file a bug with LLVM.

@mikebenfield
Copy link
Contributor Author

I filed this issue.

@mikebenfield
Copy link
Contributor Author

a "full-match on u8-based enum" – is dominating?

Yeah most of my testing was on matches that are full or nearly full, generally with enough cases so that a jump table is generated.

@mikebenfield
Copy link
Contributor Author

Cleaned up codegen_get_discr a little, wrote better comments, and made it more careful about when it can move all the arithmetic to after the cast.

Add test src/test/ui/enum-discriminant/get_discr.rs to make sure some of those cases work correctly.

@nagisa
Copy link
Member

nagisa commented Oct 18, 2022

Before I take a look at this, quantifying the wins is probably a good way to get an objective answer to performance discussions above.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@bors
Copy link
Contributor

bors commented Nov 9, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Nov 9, 2022
@rust-log-analyzer

This comment has been minimized.

@mikebenfield
Copy link
Contributor Author

@nagisa I assume that failure is some network/infrastructure hiccup? Otherwise I'm not sure what's going on.

@lqd
Copy link
Member

lqd commented Nov 10, 2022

yes

@bors retry network failure in aarch64-gnu builder: "curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to ci-mirrors.rust-lang.org:443"

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 10, 2022
@bors
Copy link
Contributor

bors commented Nov 10, 2022

⌛ Testing commit 2adb8178ce9b0d6a8a384d5e5ee3f2ca66c6deeb with merge 661023d5027c336e0aabe15e136c0f38d50146f4...

@bors
Copy link
Contributor

bors commented Nov 10, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Nov 10, 2022
@rust-log-analyzer

This comment has been minimized.

In some cases we can avoid arithmetic before checking whether a niche
represents an untagged variant.

This is relevant to rust-lang#101872
@mikebenfield
Copy link
Contributor Author

Alright, that failed because the LLVM IR I test against is not what's generated on 32 bit systems. Duh. I added // only-x86_64 to the test; someone let me know if that is not an appropriate fix.

@nagisa
Copy link
Member

nagisa commented Nov 11, 2022

It is better to have a #[no_core] test that specifies --target=x86_64-unknown-linux-gnu or somesuch so that the test can run locally on developers’ non-x86-64 machines, but this is okay too.

@bors r+

@bors
Copy link
Contributor

bors commented Nov 11, 2022

📌 Commit 51918dc has been approved by nagisa

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 11, 2022
@bors
Copy link
Contributor

bors commented Nov 11, 2022

⌛ Testing commit 51918dc with merge 742d3f0...

@bors
Copy link
Contributor

bors commented Nov 11, 2022

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 742d3f0 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 11, 2022
@bors bors merged commit 742d3f0 into rust-lang:master Nov 11, 2022
@rustbot rustbot added this to the 1.67.0 milestone Nov 11, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (742d3f0): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.6%] 2
Regressions ❌
(secondary)
2.1% [2.1%, 2.1%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.3% [-2.2%, -0.3%] 4
All ❌✅ (primary) 0.5% [0.5%, 0.6%] 2

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.9% [1.9%, 1.9%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.8% [-1.8%, -1.7%] 2
All ❌✅ (primary) - - 0

@nnethercote
Copy link
Contributor

This must have been right on the edge of the regression/no-regression categorization. Not much to worry about here.

@rustbot label: +perf-regression-triaged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.