-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mod_inv termination for the last iteration #103378
Conversation
r? @scottmcm (rust-highfive has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
33bd8c1
to
d8fcdfd
Compare
This comment has been minimized.
This comment has been minimized.
On usize=u64 platforms, the 4th iteration would overflow the `mod_gate` back to 0. Similarly for usize=u32 platforms, the 3rd iteration would overflow much the same way. I tested various approaches to resolving this, including approaches with `saturating_mul` and `widening_mul` to a double usize. Turns out LLVM likes `mul_with_overflow` the best. In fact now, that LLVM can see the iteration count is limited, it will happily unroll the loop into a nice linear sequence. You will also notice that the code around the loop got simplified somewhat. Now that LLVM is handling the loop nicely, there isn’t any more reasons to manually unroll the first iteration out of the loop (though looking at the code today I’m not sure all that complexity was necessary in the first place). Fixes rust-lang#103361
d8fcdfd
to
a3c3f72
Compare
|
||
let table_inverse = INV_TABLE_MOD_16[(x & (INV_TABLE_MOD - 1)) >> 1] as usize; | ||
// SAFETY: `m` is required to be a power-of-two, hence non-zero. | ||
let m_minus_one = unsafe { unchecked_sub(m, 1) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly unchecked
is useless here today -- LLVM turns sub nuw %m, 1
into add %m, -1
during normalization :(
(Doesn't need to change here, though. I'm just sad about llvm/llvm-project#53377.)
Thanks! Really nice to hear that LLVM is smart enough to realize that this lets LLVM fully unroll it. @bors r+ |
…tmcm Fix mod_inv termination for the last iteration On usize=u64 platforms, the 4th iteration would overflow the `mod_gate` back to 0. Similarly for usize=u32 platforms, the 3rd iteration would overflow much the same way. I tested various approaches to resolving this, including approaches with `saturating_mul` and `widening_mul` to a double usize. Turns out LLVM likes `mul_with_overflow` the best. In fact now, that LLVM can see the iteration count is limited, it will happily unroll the loop into a nice linear sequence. You will also notice that the code around the loop got simplified somewhat. Now that LLVM is handling the loop nicely, there isn’t any more reasons to manually unroll the first iteration out of the loop (though looking at the code today I’m not sure all that complexity was necessary in the first place). Fixes rust-lang#103361
…earth Rollup of 8 pull requests Successful merges: - rust-lang#102977 (remove HRTB from `[T]::is_sorted_by{,_key}`) - rust-lang#103378 (Fix mod_inv termination for the last iteration) - rust-lang#103456 (`unchecked_{shl|shr}` should use `u32` as the RHS) - rust-lang#103701 (Simplify some pointer method implementations) - rust-lang#104047 (Diagnostics `icu4x` based list formatting.) - rust-lang#104338 (Enforce that `dyn*` coercions are actually pointer-sized) - rust-lang#104498 (Edit docs for `rustc_errors::Handler::stash_diagnostic`) - rust-lang#104556 (rustdoc: use `code-header` class to format enum variants) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
#[cfg(target_pointer_width = "16")] | ||
const SIZE: usize = 1 << 13; | ||
struct HugeSize([u8; SIZE - 1]); | ||
let _ = (SIZE as *const HugeSize).align_offset(SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually prefer the strict provenance APIs in libcore -- #104632
Note sure if the lint against int2ptr casts ever got implemented? If yes we should probably enable it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I basically just copy-pasted over the reproducer from the issue…
On usize=u64 platforms, the 4th iteration would overflow the
mod_gate
back to 0. Similarly for usize=u32 platforms, the 3rd iteration would overflow much the same way.I tested various approaches to resolving this, including approaches with
saturating_mul
andwidening_mul
to a double usize. Turns out LLVM likesmul_with_overflow
the best. In fact now, that LLVM can see the iteration count is limited, it will happily unroll the loop into a nice linear sequence.You will also notice that the code around the loop got simplified somewhat. Now that LLVM is handling the loop nicely, there isn’t any more reasons to manually unroll the first iteration out of the loop (though looking at the code today I’m not sure all that complexity was necessary in the first place).
Fixes #103361