Enum should prefer discriminant zero for niche #87794

bonega · 2021-08-05T14:18:08Z

Given an enum with unassigned zero-discriminant, rust should prefer it for niche selection.
Zero as discriminant for Option<Enum> makes it possible for LLVM to optimize resulting asm.

Eliminate branch when expected value coincides.
Use smaller instruction test eax, eax instead of cmp eax, ?
Possible interaction with zeroed memory?

Example:

pub enum Size {
    One = 1,
    Two = 2,
    Three = 3,
}

pub fn handle(x: Option<Size>) -> u8 {
    match x {
        None => {0}
        Some(size) => {size as u8}
    }
}

In this case discriminant zero is available as a niche.

Above example on nightly:

 mov     eax, edi
 cmp     al, 4
 jne     .LBB0_2
 xor     eax, eax
.LBB0_2:
 ret

PR:

 mov     eax, edi
 ret

I created this PR because I had a performance regression when I tried to use an enum to represent legal grapheme byte-length for utf8.

Using an enum instead of NonZeroU8 here
resulted in a performance regression of about 5%.
I consider this to be a somewhat realistic benchmark.

Thanks to @ogoffart for pointing me in the right direction!

Edit: Updated description

rust-highfive · 2021-08-05T14:18:11Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @jackh726 (or someone else) soon.

Please see the contribution instructions for more information.

joshtriplett · 2021-08-06T10:21:23Z

CI failure looks unrelated.

Once this passes CI, we should do a perf run on it.

bonega · 2021-08-06T17:41:24Z

That will be interesting.

There is still some things that I want input on before merging.

Both problems can be seen in output from this code:

#[repr(usize)]
pub enum Size {
    One = 1,
    Two = 2,
    Three = 3,
}

pub fn handle(x: Option<Size>) -> usize {
    match x {
        None => {0}
        Some(size) => {size as usize}
    }
}

Redundant instruction:

 mov     rax, rdi
 test    rdi, rdi // this instruction is redundant - should not be part of output
 ret

Removing repr(usize) attribute from enum makes the output worse by missing a chance for branch elimination.

 test    dil, dil
 je      .LBB0_1
 movzx   eax, dil
 ret
.LBB0_1:
 xor     eax, eax
 ret

instead of:

 movzx   eax, dil
 ret

My guess is that neither of them are strongly connected to choice of niche zero.
I will try and look at intermediate code to confirm this.

bonega · 2021-08-06T20:56:33Z

I think that MIR seems agnostic in regard to niche, output is identical between nightly and my PR.
That leaves LLVM-IR or LLVM itself.

1: In LLVM-IR we have:

  %1 = icmp eq i64 %0, 0
  br i1 %1, label %bb4, label %bb1

bb1:
  %2 = icmp ult i64 %0, 4

So there is a comparison for equal to zero(None) and then another comparison for 0 <= x < 4.
Apparently the branch is optimized away by LLVM, but the comparison against zero remains.

2: Difference between repr(isize/usize) isn't that big.
But we have:

%.0 = phi i64 [ %3, %bb1 ], [ 0, %start ]
  ret i64 %.0

I think this selects the value conditionally and therefore is our culprit.

For both 1 and 2 I think that all conditional checks should be optimized away at some stage.
Since the bit pattern for the enum in question is guaranteed to be correct?
(don't know if this should be the area of rustc or LLVM).

imho this is probably not a blocker for this PR, but would be nice to fix.

compiler/rustc_target/src/abi/mod.rs

nbdd0121 · 2021-08-06T21:20:30Z

I am a bit worried that this change would reserve a much larger range than it current do. For example, if I have a 253..=253 range, and just need one more discriminant, currently it'll become 253..=254, but after this change it'll be 0..=253 (current version of this PR gives 254..=253 with the bug I mentioned above, which is the whole range!) This can potentially have an big impact for nested enums.

Another idea would be just try to creep towards zero instead of forcing zero to be used. We can extend the valid ranges just by count in both forward and backward directions, and see which one would take us closer (or include) to zero, and use the valid range extended toward that direction. So 254..=254 would become 254..=255 (and then another nesting takes it to 253..=0) while 2..=4 become 1..=4 (and another nesting takes it to 0..=4).

Or maybe we can have some heuristics about not to leave too many gaps in the representation while prefer 0 if possible.

bonega · 2021-08-07T09:28:58Z

You are totally right, I made some stupid mistake.
Also got confused by some of the terminology.

I consider it very important to allocate zero for first nesting(much more likely to have an impact).

I think of two alternatives.

Heuristics for selecting a bigger range when zero is unallocated(first nesting):
+ Changes contained in reserve and available function
- loses some nesting capacity. For anything besides u8 I think it is very unlikely to be a problem. Also many u8 enums will
have discriminant range with a small offset from zero.
- doesn't work for sparse enums, where zero is unallocated but inside discriminant range
- Can be hard to anticipate when zero will be allocated as niche due to heuristics.

Unanswered question: What is the cutoff point for aggressively allocating zero vs losing nesting capacity?
Is it possible to get any data from a crater(or something) run for niche allocation?
A statistical distribution for this would be great.

Track zero allocation. Perhaps Add zero_reserved: bool to Scalar:
+ We lose no nesting capacity at all, might even gain one.
+ Will work for sparse enums, where zero is unallocated but inside discriminant range
+ Very simple behavior, we know that zero will be allocated if free. There is no heuristics and no consideration for the max value of enum repr
- Changes needed elsewhere to track and then allocate zero outside of range(unclear of the scope or impact of this)
- Scalar becomes one byte bigger(did a quick look, think this is ~3%) 40 -> 41 bytes

Because of my limited expertise with the compiler I prefer route 1 at this time.
It would serve as proof(or not) for performance impact.
Then I can consider doing route 2, which also would involve a lot of guidance.

I will think about it some more and do a little prototyping tomorrow

bonega · 2021-08-08T20:59:14Z

@nbdd0121 Did the simple prototype without any heuristics. Just grows towards zero.

I am a bit unsure what good heuristics is because of lack of data.
Will look more at the internals of layout tomorrow.
Also trying to do a perf run on PR

bonega · 2021-08-12T18:56:40Z

I don't understand this failure, from what I can see it shouldn't even use niches.
Also the local file on my system build/x86_64-unknown-linux-gnu/test/codegen/issue-73338-effecient-cmp/issue-73338-effecient-cmp.ll doesn't contain br

nbdd0121 · 2021-08-12T19:11:13Z

I don't understand this failure, from what I can see it shouldn't even use niches.
Also the local file on my system build/x86_64-unknown-linux-gnu/test/codegen/issue-73338-effecient-cmp/issue-73338-effecient-cmp.ll doesn't contain br

There is Option<Ordering>. In this case your current code uses -2 for None while originally 2 is used.

bonega · 2021-08-12T20:47:40Z

Oh right, derive...
Still I can't reproduce the problem, I agree that the niche has changed from 2 to -2 but I don't understand why that would cause branching.
./x.py test runs without errors on my system.
I have also checked cargo +stage2 --release -- --emit llvm-ir, no problem there.
Thanks for your help

nbdd0121 · 2021-08-12T21:11:32Z

Could be with the LLVM version. Rust builds and uses LLVM 12 by default but LLVM 10 and 11 are still supported. The CI is using LLVM 10 according to its name.

bonega · 2021-08-13T07:57:01Z

Thanks for the help.
Request for review and perf run

joshtriplett · 2021-08-13T10:24:58Z

@bors try @rust-timer queue

rust-timer · 2021-08-13T10:24:59Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-08-13T10:25:06Z

⌛ Trying commit b8333b42b90c0c0963b90265e98004e7a7057ecd with merge e57f7bdd064435d08c84f9c60751acf77d396c1d...

bors · 2021-08-13T11:13:29Z

☀️ Try build successful - checks-actions
Build commit: e57f7bdd064435d08c84f9c60751acf77d396c1d (e57f7bdd064435d08c84f9c60751acf77d396c1d)

rust-timer · 2021-08-13T11:13:30Z

Queued e57f7bdd064435d08c84f9c60751acf77d396c1d with parent 04c9901, future comparison URL.

bonega · 2021-09-13T18:47:59Z

So I think this failed during a non-optimizing build.
I don't know how to specify that the asm-test should only run for an optimized build

nagisa · 2021-09-13T19:37:59Z

I don't know how to specify that the asm-test should only run for an optimized build

Perhaps // compile-flags: -Copt-level=3 or // compile-flags: -O? Additionally, I suspect that just // only-x86 is sufficient and // only-x86_64-unknown-linux-gnu is likely way too specific.

nagisa · 2021-09-13T19:57:31Z

@bors r+

bors · 2021-09-13T19:57:32Z

📌 Commit 4d66fbc has been approved by nagisa

bors · 2021-09-13T20:37:14Z

⌛ Testing commit 4d66fbc with merge 9a4d180a01eadf0aa4ff80dbbc8410a229db10e9...

bors · 2021-09-13T21:38:19Z

💔 Test failed - checks-actions

rust-log-analyzer · 2021-09-13T21:39:09Z

The job x86_64-msvc-tools failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

[TIMING] Rustc { target: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None }, compiler: Compiler { stage: 1, host: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None } } } -- 0.863
Assembling stage2 compiler (x86_64-pc-windows-msvc)
[TIMING] Sysroot { compiler: Compiler { stage: 2, host: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None } } } -- 0.000
[TIMING] Libdir { compiler: Compiler { stage: 2, host: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None } }, target: TargetSelection { triple: "x86_64-pc-windows-msvc", file: None } } -- 0.000
thread 'main' panicked at 'failed to copy `D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage1-rustc\x86_64-pc-windows-msvc\release\rustc-main.exe` to `D:\a\rust\rust\build\x86_64-pc-windows-msvc\stage2\bin\rustc.exe`: The process cannot access the file because it is being used by another process. (os error 32)', src\bootstrap\lib.rs:1335:17
Build completed unsuccessfully in 0:00:04

nagisa · 2021-09-13T21:40:10Z

@bors retry

bors · 2021-09-13T22:15:02Z

⌛ Testing commit 4d66fbc with merge 9f85cd6...

bors · 2021-09-14T01:07:02Z

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 9f85cd6 to master...

rust-timer · 2021-09-14T04:58:11Z

Finished benchmarking commit (9f85cd6): comparison url.

Summary: This benchmark run did not return any relevant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

bonega · 2021-09-14T09:16:26Z

Thanks a lot to @nagisa, @eddyb and @nbdd0121 and others for your help

tmandry · 2021-10-19T04:13:00Z

This seems to have caused #90038

rust-highfive assigned jackh726 Aug 5, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 5, 2021

bonega changed the title ~~Enum prefer zero-niche~~ Enum should prefer zero-discriminant for niche Aug 5, 2021

bonega changed the title ~~Enum should prefer zero-discriminant for niche~~ Enum should prefer discriminant zero for niche Aug 5, 2021

This comment has been minimized.

Sign in to view

nbdd0121 reviewed Aug 6, 2021

View reviewed changes

compiler/rustc_target/src/abi/mod.rs Outdated Show resolved Hide resolved

bonega force-pushed the enum_niche_prefer_zero branch from a64325f to dc51008 Compare August 8, 2021 11:41

This comment has been minimized.

Sign in to view

bonega force-pushed the enum_niche_prefer_zero branch from ea9c6f5 to d67d32f Compare August 12, 2021 16:49

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 13, 2021

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 13, 2021

enum niche allocation grows toward zero if possible

4d66fbc

bonega force-pushed the enum_niche_prefer_zero branch from 307a1bb to 4d66fbc Compare September 13, 2021 19:55

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 13, 2021

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 13, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 13, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 14, 2021

bors merged commit 9f85cd6 into rust-lang:master Sep 14, 2021

rustbot added this to the 1.57.0 milestone Sep 14, 2021

ehuss mentioned this pull request Sep 14, 2021

Windows CI runners are sporadically failing with os error 32 #88924

Closed

tmandry mentioned this pull request Oct 19, 2021

Miscompilation when awaiting generator containing an enum with niche at least [0, 1] #90038

Closed

eddyb mentioned this pull request Oct 19, 2021

Fix wrong niche calculation when 2+ niches are placed at the start #90040

Merged

Enum should prefer discriminant zero for niche #87794

Enum should prefer discriminant zero for niche #87794

Uh oh!

Conversation

bonega commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Aug 5, 2021

Uh oh!

This comment has been minimized.

joshtriplett commented Aug 6, 2021

Uh oh!

bonega commented Aug 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bonega commented Aug 6, 2021

Uh oh!

Uh oh!

nbdd0121 commented Aug 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bonega commented Aug 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

bonega commented Aug 8, 2021

Uh oh!

This comment has been minimized.

bonega commented Aug 12, 2021

Uh oh!

nbdd0121 commented Aug 12, 2021

Uh oh!

bonega commented Aug 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nbdd0121 commented Aug 12, 2021

Uh oh!

This comment has been minimized.

bonega commented Aug 13, 2021

Uh oh!

joshtriplett commented Aug 13, 2021

Uh oh!

rust-timer commented Aug 13, 2021

Uh oh!

bors commented Aug 13, 2021

Uh oh!

bors commented Aug 13, 2021

Uh oh!

rust-timer commented Aug 13, 2021

Uh oh!

bonega commented Sep 13, 2021

Uh oh!

nagisa commented Sep 13, 2021

Uh oh!

nagisa commented Sep 13, 2021

Uh oh!

bors commented Sep 13, 2021

Uh oh!

bors commented Sep 13, 2021

Uh oh!

bors commented Sep 13, 2021

Uh oh!

rust-log-analyzer commented Sep 13, 2021

Uh oh!

nagisa commented Sep 13, 2021

Uh oh!

bors commented Sep 13, 2021

Uh oh!

bors commented Sep 14, 2021

Uh oh!

rust-timer commented Sep 14, 2021

Uh oh!

bonega commented Sep 14, 2021

Uh oh!

tmandry commented Oct 19, 2021

Uh oh!

Uh oh!

bonega commented Aug 5, 2021 •

edited

Loading

bonega commented Aug 6, 2021 •

edited

Loading

nbdd0121 commented Aug 6, 2021 •

edited

Loading

bonega commented Aug 7, 2021 •

edited

Loading

bonega commented Aug 12, 2021 •

edited

Loading