-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PackedFingerprint in DepNode to reduce memory consumption #78646
Conversation
r? @eddyb (rust_highfive has picked a reviewer for you, use r? to override) |
This is a second attempt. The first was #78516, and was abandoned due to the performance implications of using a This uses |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit e3e158bfff558b96c9ba1719249153b12e999705 with merge cef34ea6b56cd5bc7d0ee0e3c916be77ef292b35... |
☀️ Try build successful - checks-actions |
Queued cef34ea6b56cd5bc7d0ee0e3c916be77ef292b35 with parent b202532, future comparison URL. |
Finished benchmarking try commit (cef34ea6b56cd5bc7d0ee0e3c916be77ef292b35): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Wow, that's a big difference! Up to 10% memory gains, almost no increased instruction count :D |
Yeah! :) And I just added a relatively simple change to the dep graph code that should net a couple more percent. This new change splits up dep node data to pack it more tightly (which is made easier by the fingerprint having lower alignment, The trade off is memory usage versus spatial locality. My hope is that the locality doesn't matter that much. In normal circumstances, I think we only access a node's data twice in this structure during compilation -- once when inserting, and once when serializing. So it seems unlikely to matter. |
9188c33
to
d97d5de
Compare
If I understand correctly, changing this to packed will result in "unaligned" loads (it's just a byte stream but the load is not on a word boundary aiui), which themselves won't bump the instruction count but rather incur latency penalty. If we look at cycles we can see a 1-3% regression across the board. |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit d97d5de6606256727454669c077febc5b241bf45 with merge e6e801fdf71de4055423874e81a84b833f85e59c... |
☀️ Try build successful - checks-actions |
Queued e6e801fdf71de4055423874e81a84b833f85e59c with parent 338f939, future comparison URL. |
And x86 is probably the best case scenario. Still, if we can mitigate the cycle increase, or if the pros outweigh the cons, it may be worth packing |
Finished benchmarking try commit (e6e801fdf71de4055423874e81a84b833f85e59c): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
The additional memory savings from the second change get lost in the variance in the benchmarks. Profiling locally verifies the modified structures to be smaller. I don't think the second change is worth the added complexity, at least while there are bigger improvements to be had elsewhere. The max-rss measurement for keccak has very high variance in my experience, so you might take it with a grain of salt. The cycles look a bit better, but I'm guessing that's also attributable to variance. May have gotten unlucky the first time, or lucky the second time. Nevertheless, I'll experiment with some other approaches that may perform better. |
d97d5de
to
5e5137d
Compare
Sorry for the slow response, I am on leave and didn't see this until just now. The perf results look good, but I am worried about the UB mentioned here. Any thoughts on that? |
Yes, that warning concerned me as well. I know that the |
Ok, r=me once you've finished asking around. |
601c942
to
08c891a
Compare
I've made the Outside of this PR, I'm working on some changes that could make this one more or less impactful, so I'll revisit whether packing continues to be worthwhile if those changes get integrated. I know packing isn't a thing to be taken lightly, but for now, it looks like a pretty clear win. |
08c891a
to
a0eaf27
Compare
Do you still want to land this now? |
Yes--it may be a while before I can finish those other changes. @RalfJung, I know you've been involved with work and discussions around references to packed fields and UB. Do you have any concerns about landing this PR? |
I recommend if you use packed fields you should set the lint added by #72270 to #![deny(unaligned_references)] |
Ah, looks like you already knew that.^^ But the lint does not seem to be enabled in rustc. To my knowledge, if the packed fields are private and the crate defining the struct has that lint enabled, that should guarantee soundness. |
To detect misuse of private packed field in `PackedFingerprint`.
Thanks! Added the |
@bors r+ |
📌 Commit 142932a has been approved by |
☀️ Test successful - checks-actions |
Those changes have landed in #79589 and #80957, and I've revisted the perf effect of reverting this change in #81230. I think it continues to carry it's weight. |
No description provided.