Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vec zero-alloc optimization for tuples and byte arrays #97581

Conversation

AngelicosPhosphoros
Copy link
Contributor

@AngelicosPhosphoros AngelicosPhosphoros commented May 31, 2022

  • Implement IsZero trait for tuples up to 8 IsZero elements;
  • Implement IsZero for u8/i8, leading to implementation of it for arrays of them too;
  • Add more codegen tests for this optimization.
  • Lower size of array for IsZero trait because it fails to inline checks

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 31, 2022
@rust-highfive
Copy link
Collaborator

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with r? rust-lang/libs-api @rustbot label +T-libs-api -T-libs to request review from a libs-api team reviewer. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 31, 2022
@AngelicosPhosphoros
Copy link
Contributor Author

r? @Mark-Simulacrum because you reviewed similar PRs in that area before.

// CHECK-LABEL: @vec_zero_bytes
#[no_mangle]
pub fn vec_zero_bytes(n: usize) -> Vec<u8> {
// CHECK-NOT: call alloc::vec::from_elem
Copy link
Contributor Author

@AngelicosPhosphoros AngelicosPhosphoros May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more checks to avoid cases when inlining don't happen.

@AngelicosPhosphoros
Copy link
Contributor Author

AngelicosPhosphoros commented May 31, 2022

What to do with the fact that constant aggregates (tuples and arrays) larger than 8 bytes fails to fold if there is more than 1 invokation of vec! macro with them in the compilation unit? Should I lower threshold or just leave it be?
proof

P.S. Maybe adding #[inline] here would help:

pub fn from_elem<T: Clone>(elem: T, n: usize) -> Vec<T> {

P.P.S. This doesn't affect it actually.

@AngelicosPhosphoros AngelicosPhosphoros force-pushed the improve_calloc_check_in_vec_macro_for_tuples branch from 757a80d to 5a78f28 Compare May 31, 2022 13:43
@jyn514
Copy link
Member

jyn514 commented May 31, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022
@bors
Copy link
Contributor

bors commented May 31, 2022

⌛ Trying commit 5a78f2889322806560cf87844ffd14042041d15e with merge 5b2f32200628769756745f20d6a0aea4e3bee040...

@bors
Copy link
Contributor

bors commented May 31, 2022

☀️ Try build successful - checks-actions
Build commit: 5b2f32200628769756745f20d6a0aea4e3bee040 (5b2f32200628769756745f20d6a0aea4e3bee040)

@rust-timer
Copy link
Collaborator

Queued 5b2f32200628769756745f20d6a0aea4e3bee040 with parent 16a0d03, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5b2f32200628769756745f20d6a0aea4e3bee040): comparison url.

Instruction count

  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
1.1% 1.1% 3
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

Max RSS (memory usage)

Results
  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: 😿 relevant regression found
mean1 max count2
Regressions 😿
(primary)
1.1% 1.1% 1
Regressions 😿
(secondary)
3.1% 3.1% 1
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 1.1% 1.1% 1

Cycles

This benchmark run did not return any relevant results for this metric.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2

  2. number of relevant changes 2

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022
@AngelicosPhosphoros
Copy link
Contributor Author

Does secondary tests measure performance of compiler itself? I didn't find anything relevant in ctfe-stress-5 benchmark code.

impl_is_zero!(i16, |x| x == 0);
impl_is_zero!(i32, |x| x == 0);
impl_is_zero!(i64, |x| x == 0);
impl_is_zero!(i128, |x| x == 0);
impl_is_zero!(isize, |x| x == 0);

impl_is_zero!(u8, |x| x == 0);
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add comments -- but u8/i8 are already done by https://github.com/rust-lang/rust/blob/master/library/alloc/src/vec/spec_from_elem.rs#L20-L48, I think.

Those impls seem more general than these, so I'd leave them in place rather than adding these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is there any reason why we want to specialize for i8/u8 instead of specializing on Copy types and probably checking for size_of::() == 1? Compiler manages replace iteration by memset. https://godbolt.org/z/rzYYYhTKj There is little difference in generated code but it is still almost similar.
  2. This implementation in this file allow [u8; N] and [i8;N] to be IsZero too and need less repetition than implementing IsZero for byte arrays directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because it's size-1 & Copy doesn't mean it's legal to branch on the value, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottmcm I mean something like this:

impl<T: Copy>impl SpecFromElem for T {
    fn from_elem(elem: T, n:usize)->Vec<T>{
        if core::mem::size_of::<T>() == 1 {
              let mut v = Vec::with_capacity(n);
              unsafe{
                     let byte_val: u8 = ptr::read(&elem as const T* as const u8*);
                     ptr::write_bytes(v.as_mut_ptr(), byte_val, n);
                     v.set_len(n);
              }
              return v;
        }
        // Default impl
        ...
    }
}

And probably IsZero variant for u8/i8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not generally valid to read a 1-byte value as u8 (I guess maybe with MaybeUninit<u8> that could be OK?) But I'm not sure this is worth trying to optimize for ourselves; I'd hope that LLVM can lower our standard init loop. (Or can be convinced to do so). I think slice::fill for example does pretty OK without such shenanigans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid, didn't think about uninit values before.

library/alloc/src/vec/is_zero.rs Outdated Show resolved Hide resolved
src/test/codegen/vec-calloc.rs Outdated Show resolved Hide resolved
@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 11, 2022
@AngelicosPhosphoros
Copy link
Contributor Author

@Mark-Simulacrum
I don't have access to any Apple device so it seems that I wouldn't be able to fix that failure.

Any suggestions what to do for me? Maybe just don't run this tests on Apple?

@bors
Copy link
Contributor

bors commented Jul 13, 2022

⌛ Testing commit fd488ccb9ad2b238eda4a318f3ceb81cbdd86eac with merge a4b83d324d0f049ad5b48fe32d13ba3c624a9c38...

@bors
Copy link
Contributor

bors commented Jul 13, 2022

💔 Test failed - checks-actions

@rust-log-analyzer

This comment has been minimized.

@scottmcm
Copy link
Member

Since it looks like this failed twice,
@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 13, 2022
@AngelicosPhosphoros AngelicosPhosphoros force-pushed the improve_calloc_check_in_vec_macro_for_tuples branch from fd488cc to 4e16167 Compare July 24, 2022 18:44
@AngelicosPhosphoros
Copy link
Contributor Author

@Mark-Simulacrum
I just added // ignore-macos to failing test. Is that OK?

@rustbot label: +S-waiting-on-review -S-waiting-on-author

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 24, 2022
* Implement IsZero trait for tuples up to 8 IsZero elements;
* Implement IsZero for u8/i8, leading to implementation of it for arrays of them too;
* Add more codegen tests for this optimization.
* Lower size of array for IsZero trait because it fails to inline checks
@Mark-Simulacrum Mark-Simulacrum force-pushed the improve_calloc_check_in_vec_macro_for_tuples branch from 4e16167 to 86d445e Compare July 24, 2022 19:59
@Mark-Simulacrum
Copy link
Member

I dropped the calloc-2 test entirely -- I'm not sure there's much value in checking that we're eliminating the zero comparison for a constant element. The whole point of bounding its length is that it's relatively cheap. I don't know why macOS has slightly different behavior, though I would suspect CGU differences or something -- probably not worth a deep investigation.

I'm not sure the other added tests here are really necessary either, but they seem more or less OK to leave for now.

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Jul 24, 2022

📌 Commit 86d445e has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2022
@bors
Copy link
Contributor

bors commented Jul 25, 2022

⌛ Testing commit 86d445e with merge babff22...

@bors
Copy link
Contributor

bors commented Jul 25, 2022

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing babff22 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 25, 2022
@bors bors merged commit babff22 into rust-lang:master Jul 25, 2022
@rustbot rustbot added this to the 1.64.0 milestone Jul 25, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (babff22): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvement found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-3.2% -3.2% 1
All 😿🎉 (primary) N/A N/A 0

Cycles

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 🎉 relevant improvements found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-5.0% -5.7% 4
All 😿🎉 (primary) N/A N/A 0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2

  2. number of relevant changes 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.