-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support vec zero-alloc optimization for tuples and byte arrays #97581
Support vec zero-alloc optimization for tuples and byte arrays #97581
Conversation
AngelicosPhosphoros
commented
May 31, 2022
•
edited
Loading
edited
- Implement IsZero trait for tuples up to 8 IsZero elements;
- Implement IsZero for u8/i8, leading to implementation of it for arrays of them too;
- Add more codegen tests for this optimization.
- Lower size of array for IsZero trait because it fails to inline checks
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
r? @m-ou-se (rust-highfive has picked a reviewer for you, use r? to override) |
r? @Mark-Simulacrum because you reviewed similar PRs in that area before. |
src/test/codegen/vec-calloc.rs
Outdated
// CHECK-LABEL: @vec_zero_bytes | ||
#[no_mangle] | ||
pub fn vec_zero_bytes(n: usize) -> Vec<u8> { | ||
// CHECK-NOT: call alloc::vec::from_elem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more checks to avoid cases when inlining don't happen.
What to do with the fact that constant aggregates (tuples and arrays) larger than 8 bytes fails to fold if there is more than 1 invokation of P.S. Maybe adding rust/library/alloc/src/vec/mod.rs Line 2422 in dcbd5f5
P.P.S. This doesn't affect it actually. |
757a80d
to
5a78f28
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 5a78f2889322806560cf87844ffd14042041d15e with merge 5b2f32200628769756745f20d6a0aea4e3bee040... |
☀️ Try build successful - checks-actions |
Queued 5b2f32200628769756745f20d6a0aea4e3bee040 with parent 16a0d03, future comparison URL. |
Finished benchmarking commit (5b2f32200628769756745f20d6a0aea4e3bee040): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesThis benchmark run did not return any relevant results for this metric. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes |
Does secondary tests measure performance of compiler itself? I didn't find anything relevant in ctfe-stress-5 benchmark code. |
library/alloc/src/vec/is_zero.rs
Outdated
impl_is_zero!(i16, |x| x == 0); | ||
impl_is_zero!(i32, |x| x == 0); | ||
impl_is_zero!(i64, |x| x == 0); | ||
impl_is_zero!(i128, |x| x == 0); | ||
impl_is_zero!(isize, |x| x == 0); | ||
|
||
impl_is_zero!(u8, |x| x == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add comments -- but u8/i8 are already done by https://github.com/rust-lang/rust/blob/master/library/alloc/src/vec/spec_from_elem.rs#L20-L48, I think.
Those impls seem more general than these, so I'd leave them in place rather than adding these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is there any reason why we want to specialize for i8/u8 instead of specializing on Copy types and probably checking for size_of::() == 1? Compiler manages replace iteration by memset. https://godbolt.org/z/rzYYYhTKj There is little difference in generated code but it is still almost similar.
- This implementation in this file allow [u8; N] and [i8;N] to be IsZero too and need less repetition than implementing IsZero for byte arrays directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because it's size-1 & Copy doesn't mean it's legal to branch on the value, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottmcm I mean something like this:
impl<T: Copy>impl SpecFromElem for T {
fn from_elem(elem: T, n:usize)->Vec<T>{
if core::mem::size_of::<T>() == 1 {
let mut v = Vec::with_capacity(n);
unsafe{
let byte_val: u8 = ptr::read(&elem as const T* as const u8*);
ptr::write_bytes(v.as_mut_ptr(), byte_val, n);
v.set_len(n);
}
return v;
}
// Default impl
...
}
}
And probably IsZero variant for u8/i8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not generally valid to read a 1-byte value as u8 (I guess maybe with MaybeUninit<u8>
that could be OK?) But I'm not sure this is worth trying to optimize for ourselves; I'd hope that LLVM can lower our standard init loop. (Or can be convinced to do so). I think slice::fill for example does pretty OK without such shenanigans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid, didn't think about uninit values before.
5a78f28
to
d70a1ce
Compare
@Mark-Simulacrum Any suggestions what to do for me? Maybe just don't run this tests on Apple? |
⌛ Testing commit fd488ccb9ad2b238eda4a318f3ceb81cbdd86eac with merge a4b83d324d0f049ad5b48fe32d13ba3c624a9c38... |
💔 Test failed - checks-actions |
This comment has been minimized.
This comment has been minimized.
Since it looks like this failed twice, |
fd488cc
to
4e16167
Compare
@Mark-Simulacrum @rustbot label: +S-waiting-on-review -S-waiting-on-author |
* Implement IsZero trait for tuples up to 8 IsZero elements; * Implement IsZero for u8/i8, leading to implementation of it for arrays of them too; * Add more codegen tests for this optimization. * Lower size of array for IsZero trait because it fails to inline checks
4e16167
to
86d445e
Compare
I dropped the calloc-2 test entirely -- I'm not sure there's much value in checking that we're eliminating the zero comparison for a constant element. The whole point of bounding its length is that it's relatively cheap. I don't know why macOS has slightly different behavior, though I would suspect CGU differences or something -- probably not worth a deep investigation. I'm not sure the other added tests here are really necessary either, but they seem more or less OK to leave for now. @bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (babff22): comparison url. Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression Footnotes |