Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enum should prefer discriminant zero for niche #87794

Merged
merged 1 commit into from
Sep 14, 2021

Conversation

bonega
Copy link
Contributor

@bonega bonega commented Aug 5, 2021

Given an enum with unassigned zero-discriminant, rust should prefer it for niche selection.
Zero as discriminant for Option<Enum> makes it possible for LLVM to optimize resulting asm.

  • Eliminate branch when expected value coincides.
  • Use smaller instruction test eax, eax instead of cmp eax, ?
  • Possible interaction with zeroed memory?

Example:

pub enum Size {
    One = 1,
    Two = 2,
    Three = 3,
}

pub fn handle(x: Option<Size>) -> u8 {
    match x {
        None => {0}
        Some(size) => {size as u8}
    }
}

In this case discriminant zero is available as a niche.

Above example on nightly:

 mov     eax, edi
 cmp     al, 4
 jne     .LBB0_2
 xor     eax, eax
.LBB0_2:
 ret

PR:

 mov     eax, edi
 ret

I created this PR because I had a performance regression when I tried to use an enum to represent legal grapheme byte-length for utf8.

Using an enum instead of NonZeroU8 here
resulted in a performance regression of about 5%.
I consider this to be a somewhat realistic benchmark.

Thanks to @ogoffart for pointing me in the right direction!

Edit: Updated description

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @jackh726 (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 5, 2021
@bonega bonega changed the title Enum prefer zero-niche Enum should prefer zero-discriminant for niche Aug 5, 2021
@bonega bonega changed the title Enum should prefer zero-discriminant for niche Enum should prefer discriminant zero for niche Aug 5, 2021
@rust-log-analyzer

This comment has been minimized.

@joshtriplett
Copy link
Member

CI failure looks unrelated.

Once this passes CI, we should do a perf run on it.

@bonega
Copy link
Contributor Author

bonega commented Aug 6, 2021

That will be interesting.

There is still some things that I want input on before merging.

Both problems can be seen in output from this code:

#[repr(usize)]
pub enum Size {
    One = 1,
    Two = 2,
    Three = 3,
}

pub fn handle(x: Option<Size>) -> usize {
    match x {
        None => {0}
        Some(size) => {size as usize}
    }
}
  1. Redundant instruction:
 mov     rax, rdi
 test    rdi, rdi // this instruction is redundant - should not be part of output
 ret
  1. Removing repr(usize) attribute from enum makes the output worse by missing a chance for branch elimination.
 test    dil, dil
 je      .LBB0_1
 movzx   eax, dil
 ret
.LBB0_1:
 xor     eax, eax
 ret

instead of:

 movzx   eax, dil
 ret

My guess is that neither of them are strongly connected to choice of niche zero.
I will try and look at intermediate code to confirm this.

@bonega
Copy link
Contributor Author

bonega commented Aug 6, 2021

I think that MIR seems agnostic in regard to niche, output is identical between nightly and my PR.
That leaves LLVM-IR or LLVM itself.

1: In LLVM-IR we have:

  %1 = icmp eq i64 %0, 0
  br i1 %1, label %bb4, label %bb1

bb1:
  %2 = icmp ult i64 %0, 4

So there is a comparison for equal to zero(None) and then another comparison for 0 <= x < 4.
Apparently the branch is optimized away by LLVM, but the comparison against zero remains.

2: Difference between repr(isize/usize) isn't that big.
But we have:

%.0 = phi i64 [ %3, %bb1 ], [ 0, %start ]
  ret i64 %.0

I think this selects the value conditionally and therefore is our culprit.

For both 1 and 2 I think that all conditional checks should be optimized away at some stage.
Since the bit pattern for the enum in question is guaranteed to be correct?
(don't know if this should be the area of rustc or LLVM).

imho this is probably not a blocker for this PR, but would be nice to fix.

@nbdd0121
Copy link
Contributor

nbdd0121 commented Aug 6, 2021

I am a bit worried that this change would reserve a much larger range than it current do. For example, if I have a 253..=253 range, and just need one more discriminant, currently it'll become 253..=254, but after this change it'll be 0..=253 (current version of this PR gives 254..=253 with the bug I mentioned above, which is the whole range!) This can potentially have an big impact for nested enums.

Another idea would be just try to creep towards zero instead of forcing zero to be used. We can extend the valid ranges just by count in both forward and backward directions, and see which one would take us closer (or include) to zero, and use the valid range extended toward that direction. So 254..=254 would become 254..=255 (and then another nesting takes it to 253..=0) while 2..=4 become 1..=4 (and another nesting takes it to 0..=4).

Or maybe we can have some heuristics about not to leave too many gaps in the representation while prefer 0 if possible.

@bonega
Copy link
Contributor Author

bonega commented Aug 7, 2021

You are totally right, I made some stupid mistake.
Also got confused by some of the terminology.

I consider it very important to allocate zero for first nesting(much more likely to have an impact).

I think of two alternatives.

  1. Heuristics for selecting a bigger range when zero is unallocated(first nesting):
    + Changes contained in reserve and available function
    - loses some nesting capacity. For anything besides u8 I think it is very unlikely to be a problem. Also many u8 enums will
    have discriminant range with a small offset from zero.
    - doesn't work for sparse enums, where zero is unallocated but inside discriminant range
    - Can be hard to anticipate when zero will be allocated as niche due to heuristics.

Unanswered question: What is the cutoff point for aggressively allocating zero vs losing nesting capacity?
Is it possible to get any data from a crater(or something) run for niche allocation?
A statistical distribution for this would be great.

  1. Track zero allocation. Perhaps Add zero_reserved: bool to Scalar:
    + We lose no nesting capacity at all, might even gain one.
    + Will work for sparse enums, where zero is unallocated but inside discriminant range
    + Very simple behavior, we know that zero will be allocated if free. There is no heuristics and no consideration for the max value of enum repr
    - Changes needed elsewhere to track and then allocate zero outside of range(unclear of the scope or impact of this)
    - Scalar becomes one byte bigger(did a quick look, think this is ~3%) 40 -> 41 bytes

Because of my limited expertise with the compiler I prefer route 1 at this time.
It would serve as proof(or not) for performance impact.
Then I can consider doing route 2, which also would involve a lot of guidance.

I will think about it some more and do a little prototyping tomorrow

@rust-log-analyzer

This comment has been minimized.

@bonega
Copy link
Contributor Author

bonega commented Aug 8, 2021

@nbdd0121 Did the simple prototype without any heuristics. Just grows towards zero.

I am a bit unsure what good heuristics is because of lack of data.
Will look more at the internals of layout tomorrow.
Also trying to do a perf run on PR

@rust-log-analyzer

This comment has been minimized.

@bonega
Copy link
Contributor Author

bonega commented Aug 12, 2021

I don't understand this failure, from what I can see it shouldn't even use niches.
Also the local file on my system build/x86_64-unknown-linux-gnu/test/codegen/issue-73338-effecient-cmp/issue-73338-effecient-cmp.ll doesn't contain br

@nbdd0121
Copy link
Contributor

I don't understand this failure, from what I can see it shouldn't even use niches.
Also the local file on my system build/x86_64-unknown-linux-gnu/test/codegen/issue-73338-effecient-cmp/issue-73338-effecient-cmp.ll doesn't contain br

There is Option<Ordering>. In this case your current code uses -2 for None while originally 2 is used.

@bonega
Copy link
Contributor Author

bonega commented Aug 12, 2021

Oh right, derive...
Still I can't reproduce the problem, I agree that the niche has changed from 2 to -2 but I don't understand why that would cause branching.
./x.py test runs without errors on my system.
I have also checked cargo +stage2 --release -- --emit llvm-ir, no problem there.
Thanks for your help

@nbdd0121
Copy link
Contributor

Could be with the LLVM version. Rust builds and uses LLVM 12 by default but LLVM 10 and 11 are still supported. The CI is using LLVM 10 according to its name.

@rust-log-analyzer

This comment has been minimized.

@bonega
Copy link
Contributor Author

bonega commented Aug 13, 2021

Thanks for the help.
Request for review and perf run

@joshtriplett
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 13, 2021
@bors
Copy link
Contributor

bors commented Aug 13, 2021

⌛ Trying commit b8333b42b90c0c0963b90265e98004e7a7057ecd with merge e57f7bdd064435d08c84f9c60751acf77d396c1d...

@bors
Copy link
Contributor

bors commented Aug 13, 2021

☀️ Try build successful - checks-actions
Build commit: e57f7bdd064435d08c84f9c60751acf77d396c1d (e57f7bdd064435d08c84f9c60751acf77d396c1d)

@rust-timer
Copy link
Collaborator

Queued e57f7bdd064435d08c84f9c60751acf77d396c1d with parent 04c9901, future comparison URL.

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 13, 2021
@bonega
Copy link
Contributor Author

bonega commented Sep 13, 2021

So I think this failed during a non-optimizing build.
I don't know how to specify that the asm-test should only run for an optimized build

@nagisa
Copy link
Member

nagisa commented Sep 13, 2021

I don't know how to specify that the asm-test should only run for an optimized build

Perhaps // compile-flags: -Copt-level=3 or // compile-flags: -O? Additionally, I suspect that just // only-x86 is sufficient and // only-x86_64-unknown-linux-gnu is likely way too specific.

@nagisa
Copy link
Member

nagisa commented Sep 13, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Sep 13, 2021

📌 Commit 4d66fbc has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 13, 2021
@bors
Copy link
Contributor

bors commented Sep 13, 2021

⌛ Testing commit 4d66fbc with merge 9a4d180a01eadf0aa4ff80dbbc8410a229db10e9...

@bors
Copy link
Contributor

bors commented Sep 13, 2021

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 13, 2021
@rust-log-analyzer
Copy link
Collaborator

The job x86_64-msvc-tools failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[TIMING] Rustc { target: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None }, compiler: Compiler { stage: 1, host: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None } } } -- 0.863
Assembling stage2 compiler (x86_64-pc-windows-msvc)
[TIMING] Sysroot { compiler: Compiler { stage: 2, host: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None } } } -- 0.000
[TIMING] Libdir { compiler: Compiler { stage: 2, host: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None } }, target: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None } } -- 0.000
thread 'main' panicked at 'failed to copy `D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage1-rustc\x86_64-pc-windows-msvc\release\rustc-main.exe` to `D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage2\bin\rustc.exe`: The process cannot access the file because it is being used by another process. (os error 32)', src\bootstrap\lib.rs:1335:17
Build completed unsuccessfully in 0:00:04

@nagisa
Copy link
Member

nagisa commented Sep 13, 2021

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 13, 2021
@bors
Copy link
Contributor

bors commented Sep 13, 2021

⌛ Testing commit 4d66fbc with merge 9f85cd6...

@bors
Copy link
Contributor

bors commented Sep 14, 2021

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 9f85cd6 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 14, 2021
@bors bors merged commit 9f85cd6 into rust-lang:master Sep 14, 2021
@rustbot rustbot added this to the 1.57.0 milestone Sep 14, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9f85cd6): comparison url.

Summary: This benchmark run did not return any relevant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

@bonega
Copy link
Contributor Author

bonega commented Sep 14, 2021

Thanks a lot to @nagisa, @eddyb and @nbdd0121 and others for your help

@tmandry
Copy link
Member

tmandry commented Oct 19, 2021

This seems to have caused #90038

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.