-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StorageLive (and even StorageDead) may be unnecessary in MIR. #68622
Comments
cc @rust-lang/wg-mir-opt (yay we have a github team now) wrt borrow invalidation, would this be a statement emitted by borrowck? |
No, lifetime-based reasoning is an easy way to break unsafe code. It would be emitted where |
Regarding loops, I'm thinking it might be hard to tell apart variables declared outside the loop vs inside the loop body, if they are only accessed inside. I think the problem is that domination might be flawed for reasoning about the entire lifecycle of a local when it might be live across a backedge. I wonder if we can rely on the requirement of initialization, i.e. no part of a local can be accessed inside the body without being initialized before the first iteration (making cycles in the MIR CFG more structural than it would be for e.g. C). So a local only accessed inside a loop body shouldn't actually be able to live across the backedge without relying on references, and in that case the placement of We would just need to be careful to not consider CFG points inside a loop body to be able to dominate something after the loop, e.g. in: loop {
x = 0;
if cond { break; }
}
f(x); the block containing the statement I think we can still use a dominator tree, we would just construct it differently around loops. I wonder if there is a formal notion of domination that takes cycles into account in the same way. cc @sunfishcode (I remember there is a similar distinction for SSA, where sometimes you don't want values created inside a loop body to be used directly outside of the loop, but rather you want to pass them through BB args or similar) |
This sounds suspicious to me. At minimum, you would I think want to take loop carried dependencies into account, no? i.e., x = 0;
loop {
x += 1;
if x > 100 { break; }
}
println(...); here the postdominator of all uses of The other concern of course is the usual one of not knowing when there might be unsafe code trying to access that memory, which means we can't get an accurate set of all accesses. (Still, I'm only mildly sympathetic on this point, it seems like we want to be able to assume that unsafe code doesn't go too crazy here repopulating values that are moved etc, since that seems unusual.) |
Loops are my main concern as well, see #68622 (comment) above.
Even if we remove I would also not hoist such a (I have no intent of breaking any |
@eddyb What is the deference between |
@nikomatsakis OTOH, Although there's a subtle implication there that having a borrow that's not guaranteed to reach a |
I guess from the perspective of the borrow checker, we treat |
Related to #61849 |
Implement a generic Destination Propagation optimization on MIR This takes the work that was originally started by @eddyb in rust-lang#47954, and then explored by me in rust-lang#71003, and implements it in a general (ie. not limited to acyclic CFGs) and dataflow-driven way (so that no additional infrastructure in rustc is needed). The pass is configured to run at `mir-opt-level=2` and higher only. To enable it by default, some followup work on it is still needed: * Performance needs to be evaluated. I did some light optimization work and tested against `tuple-stress`, which caused trouble in my last attempt, but didn't go much in depth here. * We can also enable the pass only at `opt-level=2` and higher, if it is too slow to run in debug mode, but fine when optimizations run anyways. * Debuginfo needs to be fixed after locals are merged. I did not look into what is required for this. * Live ranges of locals (aka `StorageLive` and `StorageDead`) are currently deleted. We either need to decide that this is fine, or if not, merge the variable's live ranges (or remove these statements entirely – rust-lang#68622). Some benchmarks of the pass were done in rust-lang#72635.
Implement a generic Destination Propagation optimization on MIR This takes the work that was originally started by `@eddyb` in rust-lang#47954, and then explored by me in rust-lang#71003, and implements it in a general (ie. not limited to acyclic CFGs) and dataflow-driven way (so that no additional infrastructure in rustc is needed). The pass is configured to run at `mir-opt-level=2` and higher only. To enable it by default, some followup work on it is still needed: * Performance needs to be evaluated. I did some light optimization work and tested against `tuple-stress`, which caused trouble in my last attempt, but didn't go much in depth here. * We can also enable the pass only at `opt-level=2` and higher, if it is too slow to run in debug mode, but fine when optimizations run anyways. * Debuginfo needs to be fixed after locals are merged. I did not look into what is required for this. * Live ranges of locals (aka `StorageLive` and `StorageDead`) are currently deleted. We either need to decide that this is fine, or if not, merge the variable's live ranges (or remove these statements entirely – rust-lang#68622). Some benchmarks of the pass were done in rust-lang#72635.
So... as an minimal first step, we could remove just |
|
Just to bikeshed a bit more.. agreed with what @RalfJung said about How about we take the best of both worlds.. |
The similarity with EDIT: Oh wait, it is equivalent to |
Hm, I don't see the similarity to |
|
I see what you mean now, thanks for explaining. When I see |
I agree
|
We don't have to forget the contents of the allocation. The behaviour is equivalent to a That said, in MIR, there are currently no cases that I know of where a local gets re-used after its So we could make it be I'm on board with starting out with |
The optimizer can throw it out entirely even if the spec says that the contents of the allocations are reset. After all, they are reset to |
Also generate `StorageDead` in constants r? `@eddyb` None of this special casing is actually necessary since we started promoting within constants and statics. We may want to keep some of it around out of perf reasons, but it's not required for user visible behaviour somewhat related: rust-lang#68622
If live ranges were to be implicitly defined by uses of a local, constructing such a representation would potentially shrink the live ranges relatively to those based on scope. Similarly, any transformation that removes a use of a local could also shrink the live range. Shrinking the live range is not generally valid transformation and can be easily observed, if for example locals where previously live at the same time are no longer so. How would you propose to address this without introducing some kind of marker at the start of a live range that mentions a local? EDIT: A concrete examples of two cases mentioned above: fn main() {
let a;
let x;
let y;
{
let b = 1;
x = &b as *const _ as usize;
}
a = 2;
y = &a as *const _ as usize;
assert!(!(x == y));
} If lifetime is based on a scope, then fn main() {
let mut a = 0;
let x;
let y;
{
let b = 1;
x = &b as *const _ as usize;
}
a = 2;
y = &a as *const _ as usize;
assert!(!(x == y));
} A variant of above, where |
address (in)equality is not something we can or do guarantee, LLVM will already happily trigger the assertion in your first example (or not) depending on random optimization choices. We only need to guarantee that the two variables are distinct, if they have borrows or uses that are problematic. If you took the address of Taking the address of |
The variables that are alive at the same time are guaranteed to have disjoint storage. If they are not zero-sized they must have different addresses. There is only one possible outcome of the comparison in that case. To give another example using only optimizations we already perform, consider removal of unreachable block following fn main() {
let mut a;
let x;
let mut y;
if false {
a = 1;
y = &a as *const _ as usize;
}
{
let b = 0;
x = &b as *const _ as usize;
}
a = 2;
y = &a as *const _ as usize;
assert!(!(x == y));
} |
👍
Yeah, it is those kinds of examples why I tried to talk LLVM people out of their For Rust, we get to define when each allocation lifetime starts or ends. We can define that IOW, I don't think it necessarily makes sense to use a static analysis to define the live ranges of our local variables, I think it would be better to use some dynamic property. "The local becomes live when it is first written to" is easy; it is much harder to say when it stops being live... |
A while back I was discussing
Storage{Live,Dead}
and dominators, with @tmandry (in the context of generator layout optimizations), and came to the conclusion thatStorageLive
pretty much has to dominate all uses (I doubt we ever added a check that it does so, though).More recently, I was trying to figure out what the simplest "
StorageLive
sinking" (i.e. moving the statement "later" in the CFG) optimization we could do was.The conclusion I came to was that we might not need
StorageLive
at all, because there might be a deterministic "best placement" we could compute (assuming we need exactly onellvm.lifetime.start
peralloca
).That best placement would be the least (common) dominator of all mentions of a MIR local.
Even indirect accesses require a direct borrow beforehand, so this should cover everything.
(Assuming that, given CFG points
x
,y
,z
, "x
is a common dominator ofy
andz
" means "x
dominates bothy
andz
", i.e. "to reach eithery
orz
you must go throughx
first", and the "least" suchx
is the one not dominating other common dominators ofy
andz
, i.e. it's "the closest toy
andz
")This could be:
let x = x + y;
let x = if c { a } else { b };
let x; if c { x = a; } else { x = b; }
(roughly equivalent)I am not sure about interactions with loops, though.
But this doesn't have to remain theoretical, we could compute this "ideal
StorageLive
position" and then compare it with the existing one (presumably one would dominate the other? not sure this would catch any loop issues though).StorageDead
could also be similar ("least (common) post-dominator"?).However, it also has the effect of invalidating borrows, so we would need to keep an
InvalidateBorrows(x)
statement around, and consider it one of the mentions ofx
.Then "
Storage{Live,Dead}
range shrinking" would simply boil down to hoistingInvalidateBorrows(x)
up past statements which couldn't indirectly accessx
.cc @nikomatsakis @ecstatic-morse @rust-lang/wg-mir-opt
The text was updated successfully, but these errors were encountered: