Skip to content

Optimize integer pow by removing the exit branch #122884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 14, 2024

Conversation

mzabaluev
Copy link
Contributor

@mzabaluev mzabaluev commented Mar 22, 2024

The branch at the end of the pow implementations is redundant with multiplication code already present in the loop. By rotating the exit check, this branch can be largely removed, improving code size and reducing instruction cache misses.

Testing on my machine (x86_64, 11th Gen Intel Core i5-1135G7 @ 2.40GHz), the num::int_pow benchmarks improve by some 40% for the unchecked operations and show some slight improvement for the checked operations as well.

@rustbot
Copy link
Collaborator

rustbot commented Mar 22, 2024

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 22, 2024
@Amanieu
Copy link
Member

Amanieu commented Apr 11, 2024

@bors r+

@bors
Copy link
Collaborator

bors commented Apr 11, 2024

📌 Commit 76d2530 has been approved by Amanieu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 11, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 12, 2024
…Amanieu

Optimize integer `pow` by removing the exit branch

The branch at the end of the `pow` implementations is redundant with multiplication code already present in the loop. By rotating the exit check, this branch can be largely removed, improving code size and reducing instruction cache misses.

Testing on my machine (`x86_64`, 11th Gen Intel Core i5-1135G7 @ 2.40GHz), the `num::int_pow` benchmarks improve by some 40% for the unchecked operations and show some slight improvement for the checked operations as well.
@bors
Copy link
Collaborator

bors commented Apr 12, 2024

⌛ Testing commit 76d2530 with merge 87b8256...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Apr 12, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 12, 2024
@mzabaluev
Copy link
Contributor Author

The job x86_64-gnu-llvm-18 failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)

failures:

---- [codegen] tests/codegen/issues/issue-34947-pow-i32.rs stdout ----

error: verification with 'FileCheck' failed
status: exit status: 1
command: "/usr/lib/llvm-18/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/issues/issue-34947-pow-i32/issue-34947-pow-i32.ll" "/checkout/tests/codegen/issues/issue-34947-pow-i32.rs" "--check-prefix=CHECK" "--check-prefix" "NONMSVC" "--allow-unused-prefixes" "--dump-input-context" "100"
--- stderr -------------------------------
--- stderr -------------------------------
/checkout/tests/codegen/issues/issue-34947-pow-i32.rs:9:17: error: CHECK-NEXT: is not on the line after the previous match
 // CHECK-NEXT: mul

I'm not familiar with this check, so I don't understand what's failing here and what should the fix be.

@Amanieu
Copy link
Member

Amanieu commented Apr 12, 2024

It seems that your PR has introduced a regression: LLVM is no longer able to optimize pow(5) down to just 3 multiply instructions.

@mzabaluev
Copy link
Contributor Author

It seems that your PR has introduced a regression: LLVM is no longer able to optimize pow(5) down to just 3 multiply instructions.

Does this mean the modified code performs worse in this specific case?

@Amanieu
Copy link
Member

Amanieu commented Apr 13, 2024

Yes, it will perform much worse in that specific case since LLVM is unable to optimize the loop away. See https://godbolt.org/z/nMY79Gn8r for the current code that is being generated. It might be possible to re-arrange the code so that you still get the performance benefit of this PR while still letting LLVM optimize the loop, but I'm not sure.

@Amanieu
Copy link
Member

Amanieu commented Apr 13, 2024

You can see a comparison of the old and new versions here: https://godbolt.org/z/dx3WKxhad

@RalfJung
Copy link
Member

@bors r-
(bors sync fixup)

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 17, 2024
@oskgo
Copy link
Contributor

oskgo commented Jul 11, 2024

@mzabaluev Any updates on this? Thanks!

@mzabaluev
Copy link
Contributor Author

The lack of optimization in case of a small const argument value is unfortunate.

I briefly tried to salvage it by giving the optimizer an easier time without re-introducing redundancy in the dynamic case, but didn't come up with any good ideas.

Maybe an unrolled fast path for argument values in the 0..=6 range? This would feel like an exercise in tricking the optimizer and placating the benchmarks.

The branch at the end of the `pow` implementations is redundant
with multiplication code already present in the loop. By rotating
the exit check, this branch can be largely removed, improving code size
and instruction cache coherence.
@oskgo
Copy link
Contributor

oskgo commented Jul 11, 2024

If I understand correctly you don't know how to fix the regression in a satisfactory manner, and you're not going to make the argument that the regression is tolerable?

If I'm right you should probably close this. You can always reopen if you get some new inspiration or can find guidance.

@Amanieu
Copy link
Member

Amanieu commented Jul 11, 2024

If might be worth trying something with is_val_statically_known to have 2 different paths depending on whether the input argument is a constant.

The newly optimized loop has introduced a regression in the case
when pow is called with a small constant exponent. LLVM is no longer
able to unroll the loop and the generated code is larger and slower
than what's expected in tests.

Match and handle small exponent values separately by branching out
to an explicit multiplication sequence for that exponent.
Powers larger than 6 need more than three multiplications, so these
cases are less likely to benefit from this optimization, also such
constant exponents are less likely to be used in practice.
For uses with a non-constant exponent, this might also provide
a performance benefit if the exponent is small and does not vary
between successive calls, so the same match arm tends to be taken as
a predicted branch.
@mzabaluev
Copy link
Contributor Author

If might be worth trying something with is_val_statically_known to have 2 different paths depending on whether the input argument is a constant.

I will combine this with my suggestion for the statically known case, thanks for the tip!

@oskgo it looks like we've found a way to resolve the regression, don't close this yet.

@mzabaluev
Copy link
Contributor Author

I get this error when trying to use is_val_statically_known inside pow methods:

error: `is_val_statically_known` is not yet stable as a const fn

@mzabaluev
Copy link
Contributor Author

Sounds like this would help: https://rustc-dev-guide.rust-lang.org/stability.html#rustc_allow_const_fn_unstable

It's already enabled in the library.

@Amanieu
Copy link
Member

Amanieu commented Jul 12, 2024

@mzabaluev
Copy link
Contributor Author

It is what it says on the tin: pow is annotated as const-stable, so it cannot call the const-unstable is_val_statically_known.
Your playground examples don't (can't) use stability attributes.

@Amanieu
Copy link
Member

Amanieu commented Jul 12, 2024

Right, in that case maybe it's best to go back to the version with the unroll loop.

@mzabaluev
Copy link
Contributor Author

Oh, I get it: rustc_allow_const_fn_unstable is an item attribute that is enabled by the feature.

In the dynamic exponent case, it's preferred to not increase code size,
so use solely the loop-based implementation there.
This shows about 4% penalty in the variable exponent benchmarks
on x86_64.
@mzabaluev mzabaluev force-pushed the pow-remove-exit-branch branch from 010c332 to 2f23534 Compare July 12, 2024 21:15
@oskgo
Copy link
Contributor

oskgo commented Jul 12, 2024

pinging @rust-lang/wg-const-eval due to new usage of rustc_allow_const_fn_unstable. It should be fine since this PR is purely an optimization and can always be reverted.

@RalfJung
Copy link
Member

is_val_statically_known is a very harmless intrinsic from a const-eval perspective, so seems fine for me.

@oskgo oskgo added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 19, 2024
// This gives the optimizer a way to efficiently inline call sites
// for the most common use cases with constant exponents.
// Currently, LLVM is unable to unroll the loop below.
match exp {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than this special casing could we instead just have the original loop (which LLVM knows how to unroll) for the is_val_statically_known case and your new loop for the non-constant case?

And do the same for all the other pow functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated pow and wrapped_pow as suggested.
I'm not sure the extra complication is justified for the checked operations, but I guess the optimizer will have better opportunities with the original loop there as well. I will try to make a macro so that uniform code is used everywhere without repetition.

@Amanieu Amanieu added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 23, 2024
@rust-log-analyzer

This comment has been minimized.

Give LLVM the for original, optimizable loop in pow and wrapped_pow
functions in the case when the exponent is statically known.
@mzabaluev mzabaluev force-pushed the pow-remove-exit-branch branch from 10db28f to ac88b33 Compare August 13, 2024 05:33
@Dylan-DPC Dylan-DPC added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 13, 2024
@Amanieu
Copy link
Member

Amanieu commented Aug 13, 2024

@bors r+

@bors
Copy link
Collaborator

bors commented Aug 13, 2024

📌 Commit ac88b33 has been approved by Amanieu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 13, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 13, 2024
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#122884 (Optimize integer `pow` by removing the exit branch)
 - rust-lang#127857 (Allow to customize `// TODO:` comment for deprecated safe autofix)
 - rust-lang#129034 (Add `#[must_use]` attribute to `Coroutine` trait)
 - rust-lang#129049 (compiletest: Don't panic on unknown JSON-like output lines)
 - rust-lang#129050 (Emit a warning instead of an error if `--generate-link-to-definition` is used with other output formats than HTML)
 - rust-lang#129056 (Fix one usage of target triple in bootstrap)
 - rust-lang#129058 (Add mw back to review rotation)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 14, 2024
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#122884 (Optimize integer `pow` by removing the exit branch)
 - rust-lang#127857 (Allow to customize `// TODO:` comment for deprecated safe autofix)
 - rust-lang#129034 (Add `#[must_use]` attribute to `Coroutine` trait)
 - rust-lang#129049 (compiletest: Don't panic on unknown JSON-like output lines)
 - rust-lang#129050 (Emit a warning instead of an error if `--generate-link-to-definition` is used with other output formats than HTML)
 - rust-lang#129056 (Fix one usage of target triple in bootstrap)
 - rust-lang#129058 (Add mw back to review rotation)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 14, 2024
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#122884 (Optimize integer `pow` by removing the exit branch)
 - rust-lang#127857 (Allow to customize `// TODO:` comment for deprecated safe autofix)
 - rust-lang#129034 (Add `#[must_use]` attribute to `Coroutine` trait)
 - rust-lang#129049 (compiletest: Don't panic on unknown JSON-like output lines)
 - rust-lang#129050 (Emit a warning instead of an error if `--generate-link-to-definition` is used with other output formats than HTML)
 - rust-lang#129056 (Fix one usage of target triple in bootstrap)
 - rust-lang#129058 (Add mw back to review rotation)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit bc9c31d into rust-lang:master Aug 14, 2024
6 checks passed
@rustbot rustbot added this to the 1.82.0 milestone Aug 14, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Aug 14, 2024
Rollup merge of rust-lang#122884 - mzabaluev:pow-remove-exit-branch, r=Amanieu

Optimize integer `pow` by removing the exit branch

The branch at the end of the `pow` implementations is redundant with multiplication code already present in the loop. By rotating the exit check, this branch can be largely removed, improving code size and reducing instruction cache misses.

Testing on my machine (`x86_64`, 11th Gen Intel Core i5-1135G7 @ 2.40GHz), the `num::int_pow` benchmarks improve by some 40% for the unchecked operations and show some slight improvement for the checked operations as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants