-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html5ever in the rustc-perf repository is memory-intensive #52028
Comments
I did a run with Massif: Massif doesn't get great stack traces due to inlining, but DHAT does a better job by using debuginfo. Here's the important info:
Holy cow! That's a single 12 GiB allocation happening within (Once again, Because
|
There are also some other causes of slowness for First,
It's the bit at the end that's the hottest, executing 485 million times. Because of this function, the following things are also super hot:
Second,
|
I always wondered it that super naive |
Thanks @nnethercote for the data, super helpful. |
I suspect we can remove the |
But it will require some refactoring. I think I would want to build this atop #51987. Let me summarize the setup in that PR:
We used to have a fairly hard requirement that we know the number of region variables before we could create the region inference context. This is because we represented the values as a huge matrix of bits. But now that we use sparse bit sets that is not true; we could grow the number of region variables fairly easily. This implies to me that -- to start -- we could refactor so that type check generates the initial liveness results (storing in a sparse bit set per variable) directly. This would effectively replace the The reason we'd have to be able to grow the number of region variables is that I think that computing liveness constraints can sometimes create fresh region variables and possibly constraints between those variables. (This is due to the "dropck" computation.) It'd be sort of nice if that were not true, though, and it may not be true -- if it weren't, we could compute the SCC ahead of time and then never generate the per-region liveness, instead focusing on the liveness per SCC. This may affect diagnostics, which we could recover by doing more refined computations lazilly perhaps (or maybe it wouldn't affect diagnostics, i'm not sure, I'd have to think about it). But it'd probably still be a big improvement to move to the sparse bit set instead of the vector of points. Another option that I have been wondering about is moving from "set of points" to a more compact representation -- for example, the SEME region abstraction implemented in https://github.com/rust-lang-nursery/rustc-seme-regions may be more compact. However, presently there can be multiple disjoint liveness regions, and SEME regions can't express that, so we'd have to figure out what to do there. |
More about
The resulting output when compiling
It definitely feels like we're traversing a long sequence from A to B, then moving forward A slightly (by 8 steps, to be precise), then traversing from A to B again, over and over. |
Yeah, |
Hmm, as an aside, I wonder if the premise of finding the borrows-in-scope via a dataflow is a bit silly, given that (This is another place where changing to an alternative representation of regions, one that optimizes continuous stretches of things, might be a win.) |
I made several attempts to speed up |
@nnethercote yeah, that code was not naive -- in fact, you had previously optimized it to use |
JFYI, I created this Zulip thread to talk about this issue |
@Mark-Simulacrum: #52250 reduces |
@nnethercote I'll try and enable it -- the perf collector has, I think, 32 GB of RAM (though free reports 31G, that seems odd) so I think that it should be "good enough" in that regard. Local benchmark runners can exclude it if necessary, I think... |
WIP: html5ever in the rustc-perf repository is memory-intensive Part of #52028. Rebased atop of #51987. r? @nikomatsakis
I looked into Specifically, we hit the rust/src/librustc_mir/borrow_check/path_utils.rs Lines 61 to 74 in 31263f3
Here's the
There's something quadratic going on, as shown by the equal number of occurrences for lengths 1..9855. There is also the over-representation of len=9856. I haven't yet worked out exactly where this is coming from. |
html5ever in the rustc-perf repository is memory-intensive Part of #52028. Rebased atop of #51987. r? @nikomatsakis
#52190 reduces the max-rss to 2GB-- still too big, but a lot better. To do much better than that, I think we have to start changing from a simple bitset to something that is more compressed, so we can capture patterns across variables. Something like BDDs come to mind. |
The performance remains ungreat though and @nnethercote's profiles are very useful in that regard. It seems clear (also from the fact that liveness dominates) that in this case we have somewhere a large number of borrows accumulating -- each access is then checked against those large number of borrows. I suspect we could do a lot better via some kind of hashing scheme, where we hash paths to figure out if they might possible overlap. I would imagine we would walk down the place to find prefixes. |
We are deferring further work that targets max-rss specifically to the release candidate, since it doesn't seem pressing for Edition Preview 2 right now. At this point, it seems like the next step is to look at ways to represent liveness regions that can share between region values -- e.g., something like a BDD (correct ordering will be key here) or perhaps SEME regions. |
I re-profiled. The high number of calls to |
For posterity's sake, 95% of the time spent in |
Just linking the couple issues Niko filed, related to @nnethercote's and others' profiles (but not strictly about this current issue of memory usage): |
Update: With #53168, I think we can expect html5ever's memory usage to be reduced to 1.2GB. (At least, the previous PR — which I reverted — had that effect, and this one does the same basic thing, just more soundly.) |
@Mark-Simulacrum : are you happy to close this now? With NLL, html5ever's memory usage and speed are now both reasonable (and #53383 will help both a bit more). |
html5ever still uses 6x more memory (over a gigabyte) for check builds if I'm reading the dashboard right, which I believe is unacceptably high long-term. However, I do agree that we can discontinue prioritizing this for the edition (performance regression is no longer a huge outlier for html5ever specifically). |
Note that #53327 seems to drop memory use to 600MB here (although not the latest rev, which needs to be investigated). |
Pushing this out to RC 2 — it's a perf issue and not a critical one (plus pending PRs ought to help) |
Memory use is currently at a 2.28x ratio, weighing in at 501MB. Still a lot, but not so much as to be an RC2 blocker. Would be good to know what's going on though. |
I added some Perhaps a sparser representation could help. In fact, |
I have a plan to fix this, by making (The old |
#54286 is the next step. Once that lands, I will file a PR for using
|
@lqd just pointed me at a crate ( |
The NLL dashboard has updated. The top 5 now looks like this:
A 20% increase in |
I see OOMs locally with a 16 GB memory computer.
Source code: https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks/html5ever
The text was updated successfully, but these errors were encountered: