Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make RawVec::grow mostly non-generic. #72013

Merged

Conversation

nnethercote
Copy link
Contributor

cargo-llvm-lines shows that, in various benchmarks, RawVec::grow is
instantiated 10s or 100s of times and accounts for 1-8% of lines of
generated LLVM IR.

This commit moves most of RawVec::grow into a separate function that
isn't parameterized by T, which means it doesn't need to be
instantiated many times. This reduces compile time significantly.

r? @ghost

@nnethercote
Copy link
Contributor Author

Here is some sample cargo-llvm-lines output for the webrender benchmark, showing just the lines relevant to RawVec::grow:

  Lines        Copies         Function name
  -----        ------         -------------
1012499 (100%)  26143 (100%)  (TOTAL)
  56700 (5.6%)    108 (0.4%)  alloc::raw_vec::RawVec<T,A>::grow
   9864 (1.0%)    137 (0.5%)  core::alloc::layout::Layout::array
   8208 (0.8%)    432 (1.7%)  alloc::raw_vec::RawVec<T,A>::grow::{{closure}}
   7128 (0.7%)    108 (0.4%)  alloc::raw_vec::RawVec<T,A>::current_memory

The current patch gets rid of most of that.

My local perf results are mostly good, with instruction counts reductions of up to 9%, but a few small regressions. I'm not quite sure where the regressions are coming from, I will investigate that more on Monday.

The code is in draft form, and needs cleaning up before being properly reviewed. @Amanieu may be interested.

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented May 8, 2020

⌛ Trying commit 6d2926b6381d178d7d59b6112949c3c6a2e96568 with merge d4d11c4e38c5b4fe42c2d2c10124bc45ba1fbcc8...

src/liballoc/raw_vec.rs Outdated Show resolved Hide resolved
@Mark-Simulacrum
Copy link
Member

cc @davidtwco @nikomatsakis -- this seems relevant for the polymorphization efforts, shows at least one potential big win

@nikomatsakis
Copy link
Contributor

Yes, it does!

@bors
Copy link
Contributor

bors commented May 8, 2020

💥 Test timed out

@bors bors added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 8, 2020
@nnethercote
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented May 8, 2020

⌛ Trying commit 6d2926b6381d178d7d59b6112949c3c6a2e96568 with merge cee9586dc574582deea517fa2d0eaeeb882167e3...

@bors
Copy link
Contributor

bors commented May 9, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: cee9586dc574582deea517fa2d0eaeeb882167e3 (cee9586dc574582deea517fa2d0eaeeb882167e3)

@rust-timer
Copy link
Collaborator

Queued cee9586dc574582deea517fa2d0eaeeb882167e3 with parent 7b80539, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit cee9586dc574582deea517fa2d0eaeeb882167e3, comparison URL.

@nnethercote
Copy link
Contributor Author

The perf results are all over the place. They're easier to understand if you focus on two subsets.

  • debug-full: This shows the potential. Lots of wins, the best being 6.6%. (I saw a best win of 9.1% locally, but that was without the recent wins from LLVM bitcode removal, which would overlap somewhat with these wins.)
  • check-full: This shows mostly regressions, of up to 1.5%. The current code has presumably slowed RawVec::grow down somewhat.

Hopefully I can fix the slowdowns without too much trouble. I will investigate that tomorrow.

@nnethercote
Copy link
Contributor Author

I think the slowdowns are caused by worse code being generated in some cases due to the Layout computations now being dynamic rather than static. For example, there is a codegen test codegen/vec-iter-collect-len.rs that contains the expression [1, 2, 3].iter().collect::<Vec<_>>().len(). It's supposed to boil down to 3 in the generated LLVM IR, but in at least some of my working versions of this PR I've seen the test fail because the optimization is inhibited and it generates lots of LLVM IR.

I'm taking a slightly different tack now, trying to keep those Layout computations within RawVec::grow so that the computations involving mem::{size,align}_of::<T>() are static.

@nnethercote
Copy link
Contributor Author

BTW, for the attached patches, I've seen reductions in the number of lines of LLVM IR generate as high as 15% (for syn). I was undercounting because some additional instantiations that are outside of alloc are avoided, esp. Result::map_err.

@nnethercote nnethercote force-pushed the make-RawVec-grow-mostly-non-generic branch from 6d2926b to 77aa42c Compare May 11, 2020 10:16
@nnethercote
Copy link
Contributor Author

I've reworked the code significantly, giving wins that are slightly smaller than before, but avoiding the vast majority of the losses.

There is scope for pushing harder on moving stuff out of RawVec, but when I try it the losses start to come back to some extent. I'm not sure why, possibly when RawVec::grow_* get small enough then inlining decisions start changing.

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented May 11, 2020

⌛ Trying commit 77aa42ca0e3b632fee16bcd3fe23ab33ce5c066b with merge 78ecf2ce2428bc1c359a284c6dc8bc33246879ac...

@bors
Copy link
Contributor

bors commented May 11, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 78ecf2ce2428bc1c359a284c6dc8bc33246879ac (78ecf2ce2428bc1c359a284c6dc8bc33246879ac)

@rust-timer
Copy link
Collaborator

Queued 78ecf2ce2428bc1c359a284c6dc8bc33246879ac with parent aeb4738, future comparison URL.

@nnethercote
Copy link
Contributor Author

Perf results are looking pretty good. Debug builds have some wins of up to 5.7%. Opt builds have a few wins, up to 1.7%. Check builds mostly are very slightly regressed, typically by 0.2%.

I will fiddle with this some more today, see if I can make it any better.

@bors
Copy link
Contributor

bors commented May 12, 2020

📌 Commit 68b7503 has been approved by Amanieu

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 12, 2020
@nnethercote
Copy link
Contributor Author

@bors rollup=never

Because it affects perf.

@Dylan-DPC-zz
Copy link

@bors p=1

@bors
Copy link
Contributor

bors commented May 13, 2020

⌛ Testing commit 68b7503 with merge 75e1463...

@bors
Copy link
Contributor

bors commented May 13, 2020

☀️ Test successful - checks-actions, checks-azure
Approved by: Amanieu
Pushing 75e1463 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 13, 2020
@bors bors merged commit 75e1463 into rust-lang:master May 13, 2020
@nnethercote nnethercote deleted the make-RawVec-grow-mostly-non-generic branch May 13, 2020 22:47
nnethercote added a commit to nnethercote/rust that referenced this pull request May 18, 2020
Currently, if you repeatedly push to an empty vector, the capacity
growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the
relevant code (the "amortized" growth strategy) to skip 1 and 2 in most
cases, instead using 0, 4, 8, 16, etc. (You can still get a capacity of
1 or 2 using the "exact" growth strategy, e.g. via `reserve_exact()`.)

This idea (along with the phrase "tiny Vecs are dumb") comes from the
"doubling" growth strategy that was removed from `RawVec` in rust-lang#72013.
That strategy was barely ever used -- only when a `VecDeque` was grown,
oddly enough -- which is why it was removed in rust-lang#72013.

(Fun fact: until just a few days ago, I thought the "doubling" strategy
was used for repeated push case. In other words, this commit makes
`Vec`s behave the way I always thought they behaved.)

This change reduces the number of allocations done by rustc itself by
10% or more. It speeds up rustc, and will also speed up any other Rust
program that uses `Vec`s a lot.
@nnethercote
Copy link
Contributor Author

The final perf improvements are here.

bors added a commit to rust-lang-ci/rust that referenced this pull request May 19, 2020
…nieu

Tiny Vecs are dumb.

Currently, if you repeatedly push to an empty vector, the capacity
growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the
relevant code (the "amortized" growth strategy) to skip 1 and 2, instead
using 0, 4, 8, 16, etc. (You can still get a capacity of 1 or 2 using
the "exact" growth strategy, e.g. via `reserve_exact()`.)

This idea (along with the phrase "tiny Vecs are dumb") comes from the
"doubling" growth strategy that was removed from `RawVec` in rust-lang#72013.
That strategy was barely ever used -- only when a `VecDeque` was grown,
oddly enough -- which is why it was removed in rust-lang#72013.

(Fun fact: until just a few days ago, I thought the "doubling" strategy
was used for repeated push case. In other words, this commit makes
`Vec`s behave the way I always thought they behaved.)

This change reduces the number of allocations done by rustc itself by
10% or more. It speeds up rustc, and will also speed up any other Rust
program that uses `Vec`s a lot.

In theory, the change could increase memory usage, but in practice it
doesn't. It would be an unusual program where very small `Vec`s having a
capacity of 4 rather than 1 or 2 would make a difference. You'd need a
*lot* of very small `Vec`s, and/or some very small `Vec`s with very
large elements.

r? @Amanieu
#[inline]
fn grow_if_necessary(&mut self) {
#[inline(never)]
fn grow(&mut self) {
if self.is_full() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is duplicated now, isn't it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - a better change might have been to keep the inline function but have it call a never inlined grow_always instead. Should lead to the same code generation in the end.

@SimonSapin
Copy link
Contributor

In servo/servo#26713 (comment) grow_amortized introduced in this PR is still near the top of cargo llvm-lines, for two of Servo’s largest crates. Dividing by the "copies" column shows that each monomophization takes 165 lines of IR.

I really don’t have a good sense of scale, does that sound like a lot? Given “we want it to be as small as possible” in the code comment.

@nnethercote
Copy link
Contributor Author

I have tried shrinking grow_amortized some more, but it's hard to go much further without hurting runtime performance. I will keep trying, though.

@SimonSapin
Copy link
Contributor

I understand it’s not easy, and I’m sure this PR has already improved things. I was wondering how reasonable 165 lines sounds, but maybe the easiest would be to look at those lines and see what they do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants