Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen_llvm_back: improve allocations #55871

Merged
merged 1 commit into from
Dec 4, 2018

Conversation

ljedrz
Copy link
Contributor

@ljedrz ljedrz commented Nov 11, 2018

This commit was split out from #54864. Last time it was causing an LLVM OOM, which was most probably caused by not collecting the globals.

  • preallocate vectors of known length
  • extend instead of append where the argument is consumable
  • turn 2 push loops into extends
  • create a vector from a function producing one instead of using extend_from_slice on it
  • consume modules when no longer needed
  • return an impl Iterator from generate_lto_work
  • don't collect globals, as they are iterated over and consumed right afterwards

While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 11, 2018
@nagisa
Copy link
Member

nagisa commented Nov 11, 2018

@bors try

Lets do a perf run :)

@bors
Copy link
Contributor

bors commented Nov 11, 2018

⌛ Trying commit dc1b2c7 with merge 684fb37...

bors added a commit that referenced this pull request Nov 11, 2018
codegen_llvm_back: improve allocations

This commit was split out from #54864. Last time it was causing an LLVM OOM, presumably due to aggressive preallocation strategy in `thin_lto`.

This time preallocations are more cautious and there are a few additional memory-related improvements (last 3 points from the list below).

- _gently_ preallocate vectors of known length
- `extend` instead of `append` where the argument is consumable
- turn 2 `push` loops into `extend`s
- create a vector from a function producing one instead of using `extend_from_slice` on it
- consume `modules` when no longer needed
- return an `impl Iterator` from `generate_lto_work`
- don't `collect` `globals`, as they are iterated over and consumed right afterwards

While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.
@bors
Copy link
Contributor

bors commented Nov 11, 2018

☀️ Test successful - status-travis
State: approved= try=True

@nagisa
Copy link
Member

nagisa commented Nov 11, 2018

@rust-timer build 684fb37

@rust-timer
Copy link
Collaborator

Success: Queued 684fb37 with parent b76ee83, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 684fb37

@nagisa
Copy link
Member

nagisa commented Nov 11, 2018

It seems that both instruction counts and max-rss have suffered a fair hit, while for most benchmarks the minimum rss has also dropped significantly. Essentially this means we have increased deviation, without a clear win in mean rss.

@ljedrz ljedrz force-pushed the llvm_back_allocations branch from dc1b2c7 to 9e8bafc Compare November 12, 2018 09:28
@ljedrz
Copy link
Contributor Author

ljedrz commented Nov 12, 2018

The reds from instructions were in the high-variance benchmarks, so they could be just noise. The one that is not marked with a ? and had a significant change seems to be style-servo-opt (which is a huge piece of benchmark) that went down by 2.5% in the clean build. As for max-rss, true - the variance has increased; there are some weird results.

Since there do seem to be possible wins with some of these changes (I'd like to get those minimums from max-rss), I have a suggestion - I have removed the parts of the commit that potentially reserve high capacities (those were also the risky ones in terms of LLVM OOM) and we could rerun perf. The queue is empty anyway :).

@nagisa
Copy link
Member

nagisa commented Nov 12, 2018

@bors try

@bors
Copy link
Contributor

bors commented Nov 12, 2018

⌛ Trying commit 9e8bafc24b60008353f5f4e8027379a10bc7bb35 with merge f7360e5b2e5b2ed5e1696af257b365e3cc69981d...

@bors
Copy link
Contributor

bors commented Nov 12, 2018

☀️ Test successful - status-travis
State: approved= try=True

@nagisa
Copy link
Member

nagisa commented Nov 12, 2018

@rust-timer build f7360e5b2e5b2ed5e1696af257b365e3cc69981d

@rust-timer
Copy link
Collaborator

Success: Queued f7360e5b2e5b2ed5e1696af257b365e3cc69981d with parent 0195812, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit f7360e5b2e5b2ed5e1696af257b365e3cc69981d

@ljedrz
Copy link
Contributor Author

ljedrz commented Nov 12, 2018

Since the changes at this point are all pretty harmless I'd say that the benchmark results are statistical noise.

That being said, these changes don't seem to be beneficial performance-wise, so I'm ok with closing the PR, unless you believe that they are a readability improvement / more idiomatic.

@nagisa
Copy link
Member

nagisa commented Nov 12, 2018

Uhh, no matter how I look at it, the max-rss results still seem like a hit-or-miss.

@nikic
Copy link
Contributor

nikic commented Nov 12, 2018

@nagisa max-rss has very high variance. Here's how it looks for a random recent commit: http://perf.rust-lang.org/compare.html?start=ca79ecd6940e30d4b2466bf378632efcdf5745c7&end=775eab58835f9bc0f1f01ccbb79725dce0c73b51&stat=max-rss The results for this PR seem to be within the "usual" noise.

@nagisa
Copy link
Member

nagisa commented Nov 12, 2018

@bors r+

@bors
Copy link
Contributor

bors commented Nov 12, 2018

📌 Commit 9e8bafc24b60008353f5f4e8027379a10bc7bb35 has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 12, 2018
@pietroalbini
Copy link
Member

@bors r-

Still OOMing on AppVeyor.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Nov 16, 2018
@nagisa
Copy link
Member

nagisa commented Nov 16, 2018 via email

@ljedrz
Copy link
Contributor Author

ljedrz commented Nov 16, 2018

Maybe, but how? These changes shouldn't negatively impact allocations - IMO they should make them easier to optimize.

@nagisa
Copy link
Member

nagisa commented Nov 16, 2018

IMO they should make them easier to optimize.

The optimizability is irrelevant if it is indeed what I think it is. And is hardly related to the number of allocations, but rather to the size of them. Alas, rust perf does not collect the information of interest to tell for sure.

It would be interesting to do a perf run (and they really should be run that way all the time) with overcommit disabled.

@ljedrz
Copy link
Contributor Author

ljedrz commented Nov 16, 2018

hardly related to the number of allocations

As long as the length of the iterator is known, changing a push loop to an extend reduces the number of allocations to just one.

@bors
Copy link
Contributor

bors commented Nov 17, 2018

☔ The latest upstream changes (presumably #55627) made this pull request unmergeable. Please resolve the merge conflicts.

@nagisa
Copy link
Member

nagisa commented Nov 17, 2018

I’m not sure we’re talking about the same thing. Even though the following snippet would OOM on windows, it would work just fine on UNIXes due to overcommit:

auto x = new uint8_t[2 << 42]; // 4TiB or thereabouts…
x[0] = 42;

This has nothing to do with known length or allocation count.

@rust-lang/infra is it possible to make @rust-timer to collect additional information (e.g. max-virtual-mem, in addition to max-rss…)? What repository should I fill an issue to request this?


While something like this happening is clearly a bug somewhere (and I cannot tell where exactly), it is fairly obvious some change from those in the commit are making LLVM to commit too much memory that likely ends up never being used.

One thing to debug this you could do is to compile stage1 core on your own UNIX machine and see what the maximum virtual memory ends up being. If it ends up being significantly larger than RSS, that would confirm my suspicions.

Another thing you could do is disabling overcommit (if you’re on Linux, that’s sysctl vm.overcommit=2).

Finally, we could also just bisect – there aren’t that many different changes in this PR. We could try landing them one by one (though there’s a danger that all these changes are cumulatively slightly raising the committed memory and none of them would fail CI on their own).

@pietroalbini
Copy link
Member

is it possible to make @rust-timer to collect additional information (e.g. max-virtual-mem, in addition to max-rss…)? What repository should I fill an issue to request this?

https://github.com/rust-lang-nursery/rustc-perf

@rust-timer

This comment has been minimized.

@ljedrz
Copy link
Contributor Author

ljedrz commented Nov 17, 2018

@nagisa ah, ok, thanks for the explanation; I wasn't thinking in terms of a possible memory bug.

I will do some test builds on a Linux machine when I have a bit of free time.

@ljedrz
Copy link
Contributor Author

ljedrz commented Nov 20, 2018

@nagisa I ran ./x.py build src/libcore --stage 1 with and without the changes applied (doing a clean) in between and monitored the VIRT column in htop, but there were no significant differences.

I tried to disable overcommit like you described too, but this command doesn't work in the setup I used (Manjaro Linux).

@nikomatsakis
Copy link
Contributor

What are the current thoughts here? I'm sort of inclined to close this PR as "not worth the trouble", but do you all still want to poke at it? Can I assign the review to someone else (@nagisa?).

})
.collect::<Vec<_>>();
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're iterating over globals here and then adding new globals in the loop below. With the collect that's fine, as you'll only iterate existing globals. Without the collect this is going to be an infinite loop and you OOM.

Copy link
Contributor Author

@ljedrz ljedrz Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that this doesn't seem to cause issues on architectures different than i686 (at least that's the one where the OOM was being hit on appveyor); I can compile with these changes on x64 without issues.

@ljedrz
Copy link
Contributor Author

ljedrz commented Nov 27, 2018

@nikomatsakis At this point I was interested more in the possibility of uncovering some memory-related bug (as described by @nagisa). I think @nikic is onto something - it might not be a bug but a peculiarity of unsafe code, where an iterator can be extended while being iterated over.

@ljedrz ljedrz force-pushed the llvm_back_allocations branch from 9e8bafc to ce4bce1 Compare December 3, 2018 09:21
@ljedrz ljedrz force-pushed the llvm_back_allocations branch from ce4bce1 to 2043d30 Compare December 3, 2018 09:34
@ljedrz
Copy link
Contributor Author

ljedrz commented Dec 3, 2018

@nikomatsakis I just remembered that the initial version of this PR included a change that had a -2.5% win for style-servo-opt which is considerable, especially since it is a huge benchmark; I rebased, re-included the win for servo and removed the problematic bit that @nikic marked as the one causing the OOM, so hopefully now it's good to go 🤞.

@nagisa
Copy link
Member

nagisa commented Dec 3, 2018 via email

@bors
Copy link
Contributor

bors commented Dec 3, 2018

📌 Commit 2043d30 has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 3, 2018
@bors
Copy link
Contributor

bors commented Dec 4, 2018

⌛ Testing commit 2043d30 with merge 431e0ab...

bors added a commit that referenced this pull request Dec 4, 2018
codegen_llvm_back: improve allocations

This commit was split out from #54864. Last time it was causing an LLVM OOM, which was most probably caused by not collecting the globals.

- preallocate vectors of known length
- `extend` instead of `append` where the argument is consumable
- turn 2 `push` loops into `extend`s
- create a vector from a function producing one instead of using `extend_from_slice` on it
- consume `modules` when no longer needed
- ~~return an `impl Iterator` from `generate_lto_work`~~
- ~~don't `collect` `globals`, as they are iterated over and consumed right afterwards~~

While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.
@bors
Copy link
Contributor

bors commented Dec 4, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nagisa
Pushing 431e0ab to master...

@bors bors merged commit 2043d30 into rust-lang:master Dec 4, 2018
@ljedrz ljedrz deleted the llvm_back_allocations branch December 4, 2018 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants