-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codegen_llvm_back: improve allocations #55871
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
@bors try Lets do a perf run :) |
codegen_llvm_back: improve allocations This commit was split out from #54864. Last time it was causing an LLVM OOM, presumably due to aggressive preallocation strategy in `thin_lto`. This time preallocations are more cautious and there are a few additional memory-related improvements (last 3 points from the list below). - _gently_ preallocate vectors of known length - `extend` instead of `append` where the argument is consumable - turn 2 `push` loops into `extend`s - create a vector from a function producing one instead of using `extend_from_slice` on it - consume `modules` when no longer needed - return an `impl Iterator` from `generate_lto_work` - don't `collect` `globals`, as they are iterated over and consumed right afterwards While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.
☀️ Test successful - status-travis |
@rust-timer build 684fb37 |
Success: Queued 684fb37 with parent b76ee83, comparison URL. |
Finished benchmarking try commit 684fb37 |
It seems that both instruction counts and max-rss have suffered a fair hit, while for most benchmarks the minimum rss has also dropped significantly. Essentially this means we have increased deviation, without a clear win in mean rss. |
dc1b2c7
to
9e8bafc
Compare
The reds from Since there do seem to be possible wins with some of these changes (I'd like to get those minimums from |
@bors try |
⌛ Trying commit 9e8bafc24b60008353f5f4e8027379a10bc7bb35 with merge f7360e5b2e5b2ed5e1696af257b365e3cc69981d... |
☀️ Test successful - status-travis |
@rust-timer build f7360e5b2e5b2ed5e1696af257b365e3cc69981d |
Success: Queued f7360e5b2e5b2ed5e1696af257b365e3cc69981d with parent 0195812, comparison URL. |
Finished benchmarking try commit f7360e5b2e5b2ed5e1696af257b365e3cc69981d |
Since the changes at this point are all pretty harmless I'd say that the benchmark results are statistical noise. That being said, these changes don't seem to be beneficial performance-wise, so I'm ok with closing the PR, unless you believe that they are a readability improvement / more idiomatic. |
Uhh, no matter how I look at it, the max-rss results still seem like a hit-or-miss. |
@nagisa max-rss has very high variance. Here's how it looks for a random recent commit: http://perf.rust-lang.org/compare.html?start=ca79ecd6940e30d4b2466bf378632efcdf5745c7&end=775eab58835f9bc0f1f01ccbb79725dce0c73b51&stat=max-rss The results for this PR seem to be within the "usual" noise. |
@bors r+ |
📌 Commit 9e8bafc24b60008353f5f4e8027379a10bc7bb35 has been approved by |
@bors r- Still OOMing on AppVeyor. |
Could it be that with this patch we overcommit virtual memory that ends up
being never used? On Windows overcommit is not possible which would explain
oom there but no observable regressions on perf runs?
…On Fri, Nov 16, 2018, 11:07 Pietro Albini ***@***.*** wrote:
@bors <https://github.com/bors> r-
Still OOMing on AppVeyor.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55871 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApc0jXIny6RIi09_7OWVgMoCEYUM2Lcks5uvoA6gaJpZM4YYf06>
.
|
Maybe, but how? These changes shouldn't negatively impact allocations - IMO they should make them easier to optimize. |
The optimizability is irrelevant if it is indeed what I think it is. And is hardly related to the number of allocations, but rather to the size of them. Alas, rust perf does not collect the information of interest to tell for sure. It would be interesting to do a perf run (and they really should be run that way all the time) with overcommit disabled. |
As long as the length of the iterator is known, changing a |
☔ The latest upstream changes (presumably #55627) made this pull request unmergeable. Please resolve the merge conflicts. |
I’m not sure we’re talking about the same thing. Even though the following snippet would OOM on windows, it would work just fine on UNIXes due to overcommit:
This has nothing to do with known length or allocation count. @rust-lang/infra is it possible to make @rust-timer to collect additional information (e.g. max-virtual-mem, in addition to max-rss…)? What repository should I fill an issue to request this? While something like this happening is clearly a bug somewhere (and I cannot tell where exactly), it is fairly obvious some change from those in the commit are making LLVM to commit too much memory that likely ends up never being used. One thing to debug this you could do is to compile stage1 core on your own UNIX machine and see what the maximum virtual memory ends up being. If it ends up being significantly larger than RSS, that would confirm my suspicions. Another thing you could do is disabling Finally, we could also just bisect – there aren’t that many different changes in this PR. We could try landing them one by one (though there’s a danger that all these changes are cumulatively slightly raising the committed memory and none of them would fail CI on their own). |
https://github.com/rust-lang-nursery/rustc-perf |
This comment has been minimized.
This comment has been minimized.
@nagisa ah, ok, thanks for the explanation; I wasn't thinking in terms of a possible memory bug. I will do some test builds on a Linux machine when I have a bit of free time. |
@nagisa I ran I tried to disable |
What are the current thoughts here? I'm sort of inclined to close this PR as "not worth the trouble", but do you all still want to poke at it? Can I assign the review to someone else (@nagisa?). |
}) | ||
.collect::<Vec<_>>(); | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're iterating over globals here and then adding new globals in the loop below. With the collect that's fine, as you'll only iterate existing globals. Without the collect this is going to be an infinite loop and you OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird that this doesn't seem to cause issues on architectures different than i686
(at least that's the one where the OOM was being hit on appveyor); I can compile with these changes on x64
without issues.
@nikomatsakis At this point I was interested more in the possibility of uncovering some memory-related bug (as described by @nagisa). I think @nikic is onto something - it might not be a bug but a peculiarity of |
9e8bafc
to
ce4bce1
Compare
ce4bce1
to
2043d30
Compare
@nikomatsakis I just remembered that the initial version of this PR included a change that had a -2.5% win for style-servo-opt which is considerable, especially since it is a huge benchmark; I rebased, re-included the win for servo and removed the problematic bit that @nikic marked as the one causing the OOM, so hopefully now it's good to go 🤞. |
Let's see if bors can be controlled over mail
@bors r+
…On Mon, Dec 3, 2018, 11:39 ljedrz ***@***.*** wrote:
@nikomatsakis <https://github.com/nikomatsakis> I just remembered that
the initial version of this PR included a change that had a -2.5% win for
style-servo-opt
<https://perf.rust-lang.org/compare.html?start=b76ee83254ec0398da554f25c2168d917ba60f1c&end=684fb37161c61f3e817fb2dcf81cd7c9097f3936>
which is considerable, especially since it is a huge benchmark; I rebased,
re-included the win for servo and removed the problematic bit that @nikic
<https://github.com/nikic> marked as the one causing the OOM, so
hopefully now it's good to go 🤞.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#55871 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AApc0lvHJjIT9JVeBCjiPkaEmibS8ON6ks5u1PFkgaJpZM4YYf06>
.
|
📌 Commit 2043d30 has been approved by |
codegen_llvm_back: improve allocations This commit was split out from #54864. Last time it was causing an LLVM OOM, which was most probably caused by not collecting the globals. - preallocate vectors of known length - `extend` instead of `append` where the argument is consumable - turn 2 `push` loops into `extend`s - create a vector from a function producing one instead of using `extend_from_slice` on it - consume `modules` when no longer needed - ~~return an `impl Iterator` from `generate_lto_work`~~ - ~~don't `collect` `globals`, as they are iterated over and consumed right afterwards~~ While I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.
☀️ Test successful - status-appveyor, status-travis |
This commit was split out from #54864. Last time it was causing an LLVM OOM, which was most probably caused by not collecting the globals.
extend
instead ofappend
where the argument is consumablepush
loops intoextend
sextend_from_slice
on itmodules
when no longer neededreturn animpl Iterator
fromgenerate_lto_work
don'tcollect
globals
, as they are iterated over and consumed right afterwardsWhile I'm hoping it won't cause an OOM anymore, I would still consider this a "high-risk" PR and not roll it up.