Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PackedFingerprint in DepNode to reduce memory consumption #78646

Merged
merged 3 commits into from
Nov 20, 2020

Conversation

tgnottingham
Copy link
Contributor

No description provided.

@rust-highfive
Copy link
Collaborator

r? @eddyb

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 1, 2020
@tgnottingham
Copy link
Contributor Author

This is a second attempt. The first was #78516, and was abandoned due to the performance implications of using a [u8; 16] representation, which leads to aliasing pessimizations, at least.

This uses #[repr(packed)] instead, which I assumed would be fraught with UB and performance perils versus using [u8; 16], but turns out to be much simpler and much more performant, based on local tests.

@jyn514 jyn514 added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 1, 2020
@jyn514
Copy link
Member

jyn514 commented Nov 1, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Nov 1, 2020

⌛ Trying commit e3e158bfff558b96c9ba1719249153b12e999705 with merge cef34ea6b56cd5bc7d0ee0e3c916be77ef292b35...

@bors
Copy link
Contributor

bors commented Nov 2, 2020

☀️ Try build successful - checks-actions
Build commit: cef34ea6b56cd5bc7d0ee0e3c916be77ef292b35 (cef34ea6b56cd5bc7d0ee0e3c916be77ef292b35)

@rust-timer
Copy link
Collaborator

Queued cef34ea6b56cd5bc7d0ee0e3c916be77ef292b35 with parent b202532, future comparison URL.

@camelid camelid added I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. and removed I-compiletime Issue: Problems and improvements with respect to compile times. labels Nov 2, 2020
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (cef34ea6b56cd5bc7d0ee0e3c916be77ef292b35): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

@jyn514
Copy link
Member

jyn514 commented Nov 2, 2020

Wow, that's a big difference! Up to 10% memory gains, almost no increased instruction count :D

@tgnottingham
Copy link
Contributor Author

tgnottingham commented Nov 2, 2020

Wow, that's a big difference! Up to 10% memory gains, almost no increased instruction count :D

Yeah! :) And I just added a relatively simple change to the dep graph code that should net a couple more percent.

This new change splits up dep node data to pack it more tightly (which is made easier by the fingerprint having lower alignment,
which is why I thought the change seemed related enough to justify including in this PR). There was prior work in this area (#57061), but going in the opposite direction (combining dep node data into one struct).

The trade off is memory usage versus spatial locality. My hope is that the locality doesn't matter that much. In normal circumstances, I think we only access a node's data twice in this structure during compilation -- once when inserting, and once when serializing. So it seems unlikely to matter.

@Xanewok
Copy link
Member

Xanewok commented Nov 2, 2020

Wow, that's a big difference! Up to 10% memory gains, almost no increased instruction count :D

If I understand correctly, changing this to packed will result in "unaligned" loads (it's just a byte stream but the load is not on a word boundary aiui), which themselves won't bump the instruction count but rather incur latency penalty. If we look at cycles we can see a 1-3% regression across the board.

@jyn514
Copy link
Member

jyn514 commented Nov 2, 2020

This new change splits up dep node data to pack it more tightly (which is made easier by the fingerprint having lower alignment,
which is why I thought the change seemed related enough to justify including in this PR).

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Nov 2, 2020

⌛ Trying commit d97d5de6606256727454669c077febc5b241bf45 with merge e6e801fdf71de4055423874e81a84b833f85e59c...

@bors
Copy link
Contributor

bors commented Nov 2, 2020

☀️ Try build successful - checks-actions
Build commit: e6e801fdf71de4055423874e81a84b833f85e59c (e6e801fdf71de4055423874e81a84b833f85e59c)

@rust-timer
Copy link
Collaborator

Queued e6e801fdf71de4055423874e81a84b833f85e59c with parent 338f939, future comparison URL.

@tgnottingham
Copy link
Contributor Author

If I understand correctly, changing this to packed will result in "unaligned" loads (it's just a byte stream but the load is not on a word boundary aiui), which themselves won't bump the instruction count but rather incur latency penalty. If we look at cycles we can see a 1-3% regression across the board.

And x86 is probably the best case scenario. Still, if we can mitigate the cycle increase, or if the pros outweigh the cons, it may be worth packing Fingerprints on x86 specifically.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (e6e801fdf71de4055423874e81a84b833f85e59c): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-perf

@tgnottingham
Copy link
Contributor Author

The additional memory savings from the second change get lost in the variance in the benchmarks. Profiling locally verifies the modified structures to be smaller. I don't think the second change is worth the added complexity, at least while there are bigger improvements to be had elsewhere.

The max-rss measurement for keccak has very high variance in my experience, so you might take it with a grain of salt.

The cycles look a bit better, but I'm guessing that's also attributable to variance. May have gotten unlucky the first time, or lucky the second time. Nevertheless, I'll experiment with some other approaches that may perform better.

@nnethercote
Copy link
Contributor

Sorry for the slow response, I am on leave and didn't see this until just now.

The perf results look good, but I am worried about the UB mentioned here. Any thoughts on that?

@tgnottingham
Copy link
Contributor Author

tgnottingham commented Nov 18, 2020

Yes, that warning concerned me as well. I know that the unaligned_references and safe_packed_borrows lints detect when you take references to packed fields (including inside #[derive]'d trait implementations), and I believe that is the only UB concern. Those lints aren't firing, and they were before I made the required changes, so that's some assurance. That said, perhaps they're not complete. I'll ask around for some additional assurance.

@nnethercote
Copy link
Contributor

Ok, r=me once you've finished asking around.

@tgnottingham
Copy link
Contributor Author

I've made the Fingerprint field of the PackedFingerprint private to further assuage fears that a client could take a reference to the packed field without the lints catching it. The PackedFingerprint implementation itself should be safe, as care is taken to avoid taking such references, and as I understand it, the #[derive]'d trait implementations do the same when the type is Copy.

Outside of this PR, I'm working on some changes that could make this one more or less impactful, so I'll revisit whether packing continues to be worthwhile if those changes get integrated. I know packing isn't a thing to be taken lightly, but for now, it looks like a pretty clear win.

@tgnottingham tgnottingham force-pushed the packed_fingerprints branch 2 times, most recently from 08c891a to a0eaf27 Compare November 18, 2020 23:23
@nnethercote
Copy link
Contributor

Outside of this PR, I'm working on some changes that could make this one more or less impactful, so I'll revisit whether packing continues to be worthwhile if those changes get integrated.

Do you still want to land this now?

@tgnottingham
Copy link
Contributor Author

Yes--it may be a while before I can finish those other changes.

@RalfJung, I know you've been involved with work and discussions around references to packed fields and UB. Do you have any concerns about landing this PR?

@RalfJung
Copy link
Member

I recommend if you use packed fields you should set the lint added by #72270 to deny in the relevant crates:

#![deny(unaligned_references)]

@RalfJung
Copy link
Member

Ah, looks like you already knew that.^^ But the lint does not seem to be enabled in rustc.

To my knowledge, if the packed fields are private and the crate defining the struct has that lint enabled, that should guarantee soundness.

To detect misuse of private packed field in `PackedFingerprint`.
@tgnottingham
Copy link
Contributor Author

Thanks! Added the #[deny].

@nnethercote
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Nov 20, 2020

📌 Commit 142932a has been approved by nnethercote

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2020
@bors
Copy link
Contributor

bors commented Nov 20, 2020

⌛ Testing commit 142932a with merge ae6aa22...

@bors
Copy link
Contributor

bors commented Nov 20, 2020

☀️ Test successful - checks-actions
Approved by: nnethercote
Pushing ae6aa22 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 20, 2020
@bors bors merged commit ae6aa22 into rust-lang:master Nov 20, 2020
@rustbot rustbot added this to the 1.50.0 milestone Nov 20, 2020
@tgnottingham tgnottingham deleted the packed_fingerprints branch January 20, 2021 18:32
@tgnottingham
Copy link
Contributor Author

Outside of this PR, I'm working on some changes that could make this one more or less impactful, so I'll revisit whether packing continues to be worthwhile if those changes get integrated.

Those changes have landed in #79589 and #80957, and I've revisted the perf effect of reverting this change in #81230. I think it continues to carry it's weight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.