Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make checked ops emit *unchecked* LLVM operations where feasible #124114

Merged
merged 1 commit into from
Apr 20, 2024

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Apr 18, 2024

For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write the checked methods in such a way that we stop emitting wrapping versions of them.

For example, today https://rust.godbolt.org/z/qM9YK8Txb neither

a.checked_sub(b).unwrap()

nor

a.checked_sub(b).unwrap_unchecked()

actually optimizes to sub nuw. After this PR they do.

cc #103299

@rustbot
Copy link
Collaborator

rustbot commented Apr 18, 2024

r? @workingjubilee

rustbot has assigned @workingjubilee.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 18, 2024
Comment on lines +25 to +26
// Note that `shl` and `shr` in LLVM are already unchecked. So rather than
// looking for no-wrap flags, we just need there to not be any masking.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see the masking in optimized LLVM IR and optimized MIR today: https://rust.godbolt.org/z/x5Podh9z5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the same assembly emitted, tho'.

Copy link
Member Author

@scottmcm scottmcm Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, on x86 the shift instructions are wrapping. To see a difference you'd need to make an example where LLVM can optimize based on knowing that the shift is in-range.

(Just like how sub and sub nuw have the same assembly too.)

Copy link
Member

@workingjubilee workingjubilee Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! I tried an architecture that actually has a slightly more annoying shift instruction in this regard, PowerPC! (specifically --target=powerpc64le-unknown-linux-gnu)

old_checked_sub:
        clrldi  5, 4, 32
        clrlwi  4, 4, 27
        addi 5, 5, -32
        srw 4, 3, 4
        rldicl 5, 5, 1, 63
        mr      3, 5
        blr
        .long   0
        .quad   0

new_checked_sub:
        clrldi  5, 4, 32
        slw 4, 3, 4
        addi 5, 5, -32
        rldicl 5, 5, 1, 63
        mr      3, 5
        blr
        .long   0
        .quad   0

One less instruction!

For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them.

For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither
```rust
a.checked_sub(b).unwrap()
```
nor
```rust
a.checked_sub(b).unwrap_unchecked()
```
actually optimizes to `sub nuw`.  After this PR they do.
Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first I thought "surely this doesn't need perf, how much could the compiler possibly stress over that?" but then I saw all the const eval machinery for implementing shl/shr does in fact flow through checked_{shl,shr}, thus gets used to implement it, thus gets squeezed, aiui, every time we compile a const in the shape of

const NAME: i32 = 1 << N;

right?

library/core/src/num/int_macros.rs Show resolved Hide resolved
@@ -1,5 +1,5 @@
// skip-filecheck
//@ compile-flags: -O -Zmir-opt-level=2 -Cdebuginfo=2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, first I was going to say for consistency with the other pre-codegen tests, but it looks like they're not as consistent as I thought. The mem-replace and range-iter tests don't ask for debuginfo, but then the slice-filter and chained-comparison ones do :/

I guess it's because I think it's more common for -O to be without debuginfo, and in general I don't think that the pre-codegen tests are about looking at debuginfo. Certainly in a normal --debug build the MIR inliner doesn't run so it'd look nothing like this.

I could put it back as it was if you'd like. It just adds 4 more debug lines; nothing interesting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind it, either way, tbh, I just wanted to know the reasoning so that I know it wasn't Perfectly Random.

Comment on lines +25 to +26
// Note that `shl` and `shr` in LLVM are already unchecked. So rather than
// looking for no-wrap flags, we just need there to not be any masking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the same assembly emitted, tho'.

@workingjubilee
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 19, 2024
@workingjubilee
Copy link
Member

would mostly expect the main difference to be from reduced LLVMIR if anything.

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 19, 2024
Make `checked` ops emit *unchecked* LLVM operations where feasible

For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them.

For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither
```rust
a.checked_sub(b).unwrap()
```
nor
```rust
a.checked_sub(b).unwrap_unchecked()
```
actually optimizes to `sub nuw`.  After this PR they do.

cc rust-lang#103299
@bors
Copy link
Contributor

bors commented Apr 19, 2024

⌛ Trying commit 986d9f1 with merge 928c684...

@DianQK
Copy link
Member

DianQK commented Apr 19, 2024

Could this solve rust-lang/hashbrown#509? Looks fine, I'll verify it later.

Ah, this PR doesn't change checked_add.

@bors
Copy link
Contributor

bors commented Apr 19, 2024

☀️ Try build successful - checks-actions
Build commit: 928c684 (928c6843828d26201cb676a7cc0ab4cf9b6d518f)

@rust-timer

This comment has been minimized.

@scottmcm
Copy link
Member Author

@DianQK Yeah, this is only shifts (which don't have a with.overflow intrinsic in LLVM) and checked unsigned subtraction (where nikic said in #103299 that we shouldn't use usub.with.overflow even though it exists).

Addition, Signed Subtraction, and Multiplication I didn't touch here because those overflow conditions are non-trivial and thus using the intrinsic seems like still the correct choice.

@nikic
Copy link
Contributor

nikic commented Apr 19, 2024

@scottmcm FWIW, for unsigned add overflow the recommended pattern is x + y < x, rather than uadd.with.overflow. We should probably give replacing that a try as well.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (928c684): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.6% [1.6%, 1.6%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
6.6% [2.0%, 11.1%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.7% [-3.3%, -0.0%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.4% [-3.3%, 11.1%] 4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.0% [-4.0%, -2.4%] 7
Improvements ✅
(secondary)
-2.8% [-3.6%, -2.2%] 12
All ❌✅ (primary) -3.0% [-4.0%, -2.4%] 7

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.4%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.0% [-0.1%, -0.0%] 13
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [-0.1%, 0.4%] 17

Bootstrap: 673.499s -> 673.605s (0.02%)
Artifact size: 315.24 MiB -> 315.24 MiB (0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 19, 2024
@scottmcm
Copy link
Member Author

Hmm, I got all excited by those results -- even -20% cycles on a runtime benchmark! -- but I think they're just noise from recovering from a huge spike in the previous runs.

Nothing concerning from it, though, so I think this is good to go if people are happy with the code changes.

@workingjubilee
Copy link
Member

Yeah, seems good to me!

@bors r+

@bors
Copy link
Contributor

bors commented Apr 20, 2024

📌 Commit 986d9f1 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 20, 2024
@bors
Copy link
Contributor

bors commented Apr 20, 2024

⌛ Testing commit 986d9f1 with merge 9c7b1f4...

@bors
Copy link
Contributor

bors commented Apr 20, 2024

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 9c7b1f4 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 20, 2024
@bors bors merged commit 9c7b1f4 into rust-lang:master Apr 20, 2024
13 checks passed
@rustbot rustbot added this to the 1.79.0 milestone Apr 20, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9c7b1f4): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
10.1% [10.1%, 10.1%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.0% [-3.0%, -3.0%] 1
Improvements ✅
(secondary)
-2.1% [-2.1%, -2.1%] 1
All ❌✅ (primary) 3.5% [-3.0%, 10.1%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.5% [2.5%, 2.5%] 1
Improvements ✅
(primary)
-1.4% [-1.4%, -1.4%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.4% [-1.4%, -1.4%] 1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.4%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.1%, -0.0%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.1%, 0.4%] 8

Bootstrap: 670.134s -> 670.156s (0.00%)
Artifact size: 315.30 MiB -> 315.30 MiB (0.00%)

@scottmcm scottmcm deleted the better-checked branch April 20, 2024 14:19
@scottmcm
Copy link
Member Author

Ah, yup, those perf results make more sense.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request May 4, 2024
…strieb

Docs: suggest `uN::checked_sub` instead of check-then-unchecked

As of rust-lang#124114 it's exactly the same in codegen, so might as well not use `unsafe`.

Note that this is only for *unsigned*, since the overflow conditions for `iN::checked_sub` are more complicated.
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request May 4, 2024
Rollup merge of rust-lang#124701 - scottmcm:unchecked_sub_docs, r=Nilstrieb

Docs: suggest `uN::checked_sub` instead of check-then-unchecked

As of rust-lang#124114 it's exactly the same in codegen, so might as well not use `unsafe`.

Note that this is only for *unsigned*, since the overflow conditions for `iN::checked_sub` are more complicated.
bors added a commit to rust-lang-ci/rust that referenced this pull request May 12, 2024
Invert comparison in `uN::checked_sub`

After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc).

This is due to the use of `>=` here:

https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593

For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742

This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299.

When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary.

[^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.
fmease added a commit to fmease/rust that referenced this pull request May 15, 2024
Invert comparison in `uN::checked_sub`

After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc).

This is due to the use of `>=` here:

https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593

For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742

This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299.

When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary.

[^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request May 15, 2024
Rollup merge of rust-lang#125038 - ivan-shrimp:checked_sub, r=joboet

Invert comparison in `uN::checked_sub`

After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc).

This is due to the use of `>=` here:

https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593

For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742

This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299.

When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary.

[^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 23, 2024
…=<try>

Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 23, 2024
…=<try>

Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 24, 2024
…=Amanieu

Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 25, 2024
…=Amanieu

Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
tiif pushed a commit to tiif/miri that referenced this pull request Jun 25, 2024
Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
flip1995 pushed a commit to flip1995/rust-clippy that referenced this pull request Jun 28, 2024
Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Jul 11, 2024
Also get `add nuw` from `uN::checked_add`

When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too.

It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead.

cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before

cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants