-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc_codegen_ssa: Better code generation for niche discriminants. #102872
Conversation
r? @nagisa (rust-highfive has picked a reviewer for you, use r? to override) |
Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 |
In my performance testing this improves runtime of every benchmark I tried from I felt some of the names around niches were confusing so I renamed some things. If it's too obnoxious of me to include the renames here I can change them back. |
This comment has been minimized.
This comment has been minimized.
@@ -650,7 +650,7 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> { | |||
// declared list of variants -- they can differ with explicitly assigned discriminants. | |||
// We use "tag" to refer to how the discriminant is encoded in memory, which can be either | |||
// straight-forward (`TagEncoding::Direct`) or with a niche (`TagEncoding::Niche`). | |||
let (tag_scalar_layout, tag_encoding, tag_field) = match op.layout.variants { | |||
let (tag_scalar_layout, encoding, field) = match op.layout.variants { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you rename these? These local variable names were carefully chosen. Is 'tag' no longer accurate?
Variants::Multiple { tag, ref tag_encoding, tag_field, .. } => { | ||
(tag, tag_encoding, tag_field) | ||
Variants::Multiple { scalar, ref encoding, field, .. } => { | ||
(scalar, encoding, field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'scalar', in the interpreter, usually refers to an interpret::Scalar
. Please avoid using that name for things that have another type. This here describes the type/ABI of a scalar, but it is not a scalar value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. But given that objects of this type shouldn't be referred to as scalar
s... maybe Scalar
is not a good name for the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider its full name to be abi::Scalar
.
My understanding is that the name makes more sense outside of the interpreter, but given that the interpreter has interpret::Scalar
we have a somewhat unfortunate conflict.
Variants::Multiple { tag_field, .. } => { | ||
if tag_field == field { | ||
Variants::Multiple { field, .. } => { | ||
if field == field { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something went very wrong here.
compiler/rustc_target/src/abi/mod.rs
Outdated
tag_field: usize, | ||
encoding: TagEncoding, | ||
scalar: Scalar, | ||
field: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could live with removing the tag_
prefix (but it seems you did a global search-and-replace here and that changed the meaning of some code, see my previous comment), but scalar: Scalar
is worse than before IMO. This is the scalar ABI of the tag.
You also removed the term 'tag' from the comment, which is bad IMO. We used to have a lot of confusion between discriminant and tag, until I started systematically putting these terms everywhere. Please don't revert us back to the confusing state.
let hi_cmp = bx.icmp(IntPredicate::IntULE, value, hi); | ||
bx.assume(lo_cmp); | ||
bx.assume(hi_cmp); | ||
value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget to apply most of this extra logic to cg_clif?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess cg_clif refers to cranelift. Yeah I made no changes to code generation for cranelift.
I can't comment on the codegen change, but the rename overall is a net negative IMO. I am open to suggestions for how to improve our current names, but we need a term for 'the discriminant in its encoded form in memory'. That thing used to be called 'tag', and this PR leaves that term in some places ( |
Alright I'll undo the name changes. FWIW the main problem for me is that "niche" is used in vague and confusing ways, in ways that seem to overlap or conflict with "tag" and, to some degree, "discriminant." Is a "niche value" whatever value is in the field that has been designated as a niche, or are the niche values only the ones used to indicate discriminants or, almost the opposite, are the niche values the ones that are actually meaningful in the underlying field? Another minor issue is that just plain But it doesn't really matter. I'll push a new commit in the near future to undo the name stuff. |
As I said I am open to renaming 'tag' if you have some good ideas. But I don't like just removing the name without replacement -- concepts like that need names. I'm not saying our current terms are great, they are not, but even bad terms are better than no terms. The way I think about it: "niche" values of a type are bit patterns that are not valid representations for that type, and can be used to store the tag which can then be mapped to the discriminant. (In that sense they are not even values...) |
85e4222
to
9a7f3f9
Compare
@mikebenfield Looks like we've both noticed the "needless sub before cmp" and implemented basically the same idea, but you pushed your PR first 😆. I actually abandoned my branch, as perf wins I was seeing were very minuscule (despite asm looking quite nice in some artificial tests (see this gist)). I've ran your branch through this gist (see So, on my branch there are some nice assembly improvements, but almost negligible perf wins, while you have some assembly regressions (and couple improvements) but significant perf wins. This is interesting! Perhaps the case in which I have slight regression (if that's even a regression), and you have an improvement – a "full-match on u8-based enum" – is dominating? I hope we can analyze it further and come up with "best-of-both-worlds" approach. From quick look at your code, the differences in implementation I've seen are:
|
Oh that's funny we both had the same idea. I didn't check on cases like your |
You're right. I did a bit of mental shortcut – what I've meant is that |
OK, here appears to be the reason my generated code regresses in that case. Given this LLVM IR: define i1 @myfunction(i8 %0) {
start:
%1 = icmp ule i8 %0, 49
%2 = zext i8 %0 to i64
%3 = add i64 %2, -46
%_2 = select i1 %1, i64 %3, i64 0
%4 = icmp eq i64 %_2, 1
ret i1 %4
}
I run an define i1 @myfunction(i8 %0) {
start:
%1 = icmp eq i8 %0, 47
ret i1 %1
} Great! That works as expected. But instead change the function to take an define i1 @myfunction(i16 %0) {
start:
%z = trunc i16 %0 to i8
%1 = icmp ule i8 %z, 49
%2 = zext i8 %z to i64
%3 = add i64 %2, -46
%_2 = select i1 %1, i64 %3, i64 0
%4 = icmp eq i64 %_2, 1
ret i1 %4
} and the define i1 @myfunction(i16 %0) {
start:
%z = trunc i16 %0 to i8
%1 = icmp ult i8 %z, 50
%z.mask = and i16 %0, 255
%2 = icmp eq i16 %z.mask, 47
%3 = and i1 %1, %2
ret i1 %3
} Which is... not so great. Maybe I'll file a bug with LLVM. |
I filed this issue. |
Yeah most of my testing was on matches that are full or nearly full, generally with enough cases so that a jump table is generated. |
9a7f3f9
to
6e405a9
Compare
Cleaned up Add test |
6e405a9
to
fa51934
Compare
Before I take a look at this, quantifying the wins is probably a good way to get an objective answer to performance discussions above. @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
💔 Test failed - checks-actions |
This comment has been minimized.
This comment has been minimized.
@nagisa I assume that failure is some network/infrastructure hiccup? Otherwise I'm not sure what's going on. |
yes @bors retry network failure in aarch64-gnu builder: "curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to ci-mirrors.rust-lang.org:443" |
⌛ Testing commit 2adb8178ce9b0d6a8a384d5e5ee3f2ca66c6deeb with merge 661023d5027c336e0aabe15e136c0f38d50146f4... |
💔 Test failed - checks-actions |
This comment has been minimized.
This comment has been minimized.
In some cases we can avoid arithmetic before checking whether a niche represents an untagged variant. This is relevant to rust-lang#101872
2adb817
to
51918dc
Compare
Alright, that failed because the LLVM IR I test against is not what's generated on 32 bit systems. Duh. I added |
It is better to have a @bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (742d3f0): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
This must have been right on the edge of the regression/no-regression categorization. Not much to worry about here. @rustbot label: +perf-regression-triaged |
In some cases we can avoid arithmetic before checking whether a niche is a tag.
Also rename some identifiers around niches.
This is relevant to #101872