-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't store locals that have been moved from in generators #61922
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
Are there any size comparisons so we can see the magnitude of the effect of this patch? |
So what are the assumptions about MIR semantics that are made here? Would be great to have that summarized in the code somewhere and in the PR description, so that one does not have to reverse engineer that from the dataflow analysis. |
Oh also, what would be really awesome (both for this PR and #60187) is if you could write some code that would "contradict" this optimization and thus should have UB. This would be helpful as a Miri testcase and also to better document the assumption that is being made. That might not always be possible though. |
It would be nice to see a minimal MIR example of a case it's supposed to help with too. |
92f55b7
to
e5214a9
Compare
@RalfJung The minimal testcase that produces UB would be something like fn main() {
static || {
let x = String::new("42");
let y = x;
yield;
assert!(&x == "42");
}
} i.e., you have to depend on the value of a var after moving it. (And this has to be done after a yield point to trigger the optimization and thus the UB.) AFAIK there's no way to write this testcase in surface Rust today. If you borrow |
@Zoxc I included some testcases, do you think MIR examples would be more clear? |
@tmandry Yes. It's easier to see the problematic unwind edges with MIR. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
e5214a9
to
11631e9
Compare
Sorry, I should have pointed out that the problem is described quite well by @Matthias247 with a CFG diagram level over at #59123. In brief, from the MIR it looks like we can reach a drop from a variable after it has already been moved out of. In reality, we will never execute the drop after moving because of the state of the drop flag in any execution. If we didn't have drop flags, we might not need this change. However, there's another problem I think this pass could solve, but doesn't yet. Consider: || {
let first = [0; 1024];
yield;
let second = first;
yield;
let _third = second;
yield;
} We can definitely overlap (EDIT: It's because I was only using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @pnkfelix @ecstatic-morse for the dataflow changes
To be clear, this includes any way to take So basically soundness of optimizations relies only on the fact that if the address of a local has not been taken, after moving out there is just no way to access that storage again. It can't be accessed through existing pointers because none have been taken, and new pointers cannot be created because the move checker would prevent that. I may have asked this before, but you only do this for "whole" locals right, not the individual fields of a struct? There should probably be a comment somewhere saying that this is important. (Taking the address of one field "leaks" information about other fields, so we'd have to commit to a stricter semantics if we wanted to have UB from those fields being accessed.)
Unfortunately without a MIR parser or so, we can't feed MIR to Miri. :/ Cc rust-lang/miri#196 |
StatementKind::StorageDead(l) => sets.kill(l), | ||
StatementKind::Assign(ref place, _) | ||
| StatementKind::SetDiscriminant { ref place, .. } => { | ||
place.base_local().map(|l| sets.gen(l)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct if place
is a deref projection? In other words, do you want *(_1) = ...
to gen _1
and not do anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so, because we don't kill the local if it has ever had its address taken. So there's no need to track pointers and derefs so we can gen it again.
Yes, and specifically I'm checking for
Yep!
That's right; I'll add a comment to make this clear. |
c6ea824
to
e39e3cd
Compare
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☔ The latest upstream changes (presumably #61787) made this pull request unmergeable. Please resolve the merge conflicts. |
} else { | ||
self.flow_state.reconstruct_statement_effect(loc); | ||
self.flow_state.apply_local_effect(loc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has slightly different semantics than DataflowResultsConsumer
. Your code applies all of {before_,}{statement,terminator}_effect
for the statement at loc
to the dataflow state, whereas DataflowResultsConsumer
only applies before_{statement,terminator}_effect
. Both your code and DataflowResultsConsumer
pick up the effects on {statement,terminator}_effect
on the transfer function, however.
I think you should not call apply_local_effect
for that last statement (similar to what state_for_location
does) and document precisely what state the underlying DataflowResults
will be in after calling seek
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL. I wish the implementation were more elegant.
e39e3cd
to
453fbde
Compare
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Oh, I forgot to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with comments addressed
@bors r=matthewjasper |
📌 Commit a68e2c7 has been approved by |
…, r=matthewjasper Don't store locals that have been moved from in generators This avoids reserving storage in generators for locals that are moved out of (and not re-initialized) prior to yield points. Fixes rust-lang#59123. This adds a new dataflow analysis, `RequiresStorage`, to determine whether the storage of a local can be destroyed without being observed by the program. The rules are: 1. StorageLive(x) => mark x live 2. StorageDead(x) => mark x dead 3. If a local is moved from, _and has never had its address taken_, mark it dead 4. If (any part of) a local is initialized, mark it live' This is used to determine whether to save a local in the generator object at all, as well as which locals can be overlapped in the generator layout. Here's the size in bytes of all testcases included in the change, before and after the change: async fn test |Size before |Size after -----------------|------------|---------- single | 1028 | 1028 single_with_noop | 2056 | 1032 joined | 5132 | 3084 joined_with_noop | 8208 | 3084 generator test |Size before |Size after ----------------------------|------------|---------- move_before_yield | 1028 | 1028 move_before_yield_with_noop | 2056 | 1032 overlap_move_points | 3080 | 2056 ## Future work Note that there is a possible extension to this optimization, which modifies rule 3 to read: "If a local is moved from, _**and either has never had its address taken, or is Freeze and has never been mutably borrowed**_, mark it dead." This was discussed at length in rust-lang#59123 and then rust-lang#61849. Because this would cause some behavior to be UB which was not UB before, it's a step that needs to be taken carefully. A more immediate priority for me is inlining `std::mem::size_of_val(&x)` so it becomes apparent that the address of `x` is not taken. This way, using `size_of_val` to look at the size of your inner futures does not affect the size of your outer future. cc @cramertj @eddyb @Matthias247 @nikomatsakis @RalfJung @Zoxc
…jasper Don't store locals that have been moved from in generators This avoids reserving storage in generators for locals that are moved out of (and not re-initialized) prior to yield points. Fixes #59123. This adds a new dataflow analysis, `RequiresStorage`, to determine whether the storage of a local can be destroyed without being observed by the program. The rules are: 1. StorageLive(x) => mark x live 2. StorageDead(x) => mark x dead 3. If a local is moved from, _and has never had its address taken_, mark it dead 4. If (any part of) a local is initialized, mark it live' This is used to determine whether to save a local in the generator object at all, as well as which locals can be overlapped in the generator layout. Here's the size in bytes of all testcases included in the change, before and after the change: async fn test |Size before |Size after -----------------|------------|---------- single | 1028 | 1028 single_with_noop | 2056 | 1032 joined | 5132 | 3084 joined_with_noop | 8208 | 3084 generator test |Size before |Size after ----------------------------|------------|---------- move_before_yield | 1028 | 1028 move_before_yield_with_noop | 2056 | 1032 overlap_move_points | 3080 | 2056 ## Future work Note that there is a possible extension to this optimization, which modifies rule 3 to read: "If a local is moved from, _**and either has never had its address taken, or is Freeze and has never been mutably borrowed**_, mark it dead." This was discussed at length in #59123 and then #61849. Because this would cause some behavior to be UB which was not UB before, it's a step that needs to be taken carefully. A more immediate priority for me is inlining `std::mem::size_of_val(&x)` so it becomes apparent that the address of `x` is not taken. This way, using `size_of_val` to look at the size of your inner futures does not affect the size of your outer future. cc @cramertj @eddyb @Matthias247 @nikomatsakis @RalfJung @Zoxc
☀️ Test successful - checks-azure, checks-travis, status-appveyor |
Just for reference: The size of the future in my program dropped from 32kB to 12kB with this change - while the previous change already brought it down from 64kB to 32kB. We are getting there! Thanks @tmandry ! Unfortunately some doubling of future sizes still seems to happen, as initially reported in #59087. I have seen that @tmandry has now also opened #62321 for this issue (I used |
@Matthias247 Just to make sure, does the doubling all seem to come from your use of I know that any borrowing of the future before And if there is doubling without any borrowing, I'd definitely like to know about that. |
I tested the generator optimizations in rust-lang#60187 and rust-lang#61922 on the Fuchsia build, and noticed that some small generators (about 8% of the async fns in our build) increased in size slightly. This is because in rust-lang#60187 we split the fields into two groups, a "prefix" non-overlap region and an overlap region, and lay them out separately. This can introduce unnecessary padding bytes between the two groups. In every single case in the Fuchsia build, it was due to there being only a single variant being used in the overlap region. This means that we aren't doing any overlapping, period. So it's better to combine the two regions into one and lay out all the fields at once, which is what this change does.
…ssions, r=cramertj Fix generator size regressions due to optimization I tested the generator optimizations in rust-lang#60187 and rust-lang#61922 on the Fuchsia build, and noticed that some small generators (about 8% of the async fns in our build) increased in size slightly. This is because in rust-lang#60187 we split the fields into two groups, a "prefix" non-overlap region and an overlap region, and lay them out separately. This can introduce unnecessary padding bytes between the two groups. In every single case in the Fuchsia build, it was due to there being only a single variant being used in the overlap region. This means that we aren't doing any overlapping, period. So it's better to combine the two regions into one and lay out all the fields at once, which is what this change does. r? @cramertj cc @eddyb @Zoxc
This avoids reserving storage in generators for locals that are moved
out of (and not re-initialized) prior to yield points. Fixes #59123.
This adds a new dataflow analysis,
RequiresStorage
, to determine whether the storage of a local can be destroyed without being observed by the program. The rules are:This is used to determine whether to save a local in the generator object at all, as well as which locals can be overlapped in the generator layout.
Here's the size in bytes of all testcases included in the change, before and after the change:
Future work
Note that there is a possible extension to this optimization, which modifies rule 3 to read: "If a local is moved from, and either has never had its address taken, or is Freeze and has never been mutably borrowed, mark it dead." This was discussed at length in #59123 and then #61849. Because this would cause some behavior to be UB which was not UB before, it's a step that needs to be taken carefully.
A more immediate priority for me is inlining
std::mem::size_of_val(&x)
so it becomes apparent that the address ofx
is not taken. This way, usingsize_of_val
to look at the size of your inner futures does not affect the size of your outer future.cc @cramertj @eddyb @Matthias247 @nikomatsakis @RalfJung @Zoxc