Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StorageLive (and even StorageDead) may be unnecessary in MIR. #68622

Open
eddyb opened this issue Jan 28, 2020 · 23 comments
Open

StorageLive (and even StorageDead) may be unnecessary in MIR. #68622

eddyb opened this issue Jan 28, 2020 · 23 comments
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Jan 28, 2020

A while back I was discussing Storage{Live,Dead} and dominators, with @tmandry (in the context of generator layout optimizations), and came to the conclusion that StorageLive pretty much has to dominate all uses (I doubt we ever added a check that it does so, though).

More recently, I was trying to figure out what the simplest "StorageLive sinking" (i.e. moving the statement "later" in the CFG) optimization we could do was.

The conclusion I came to was that we might not need StorageLive at all, because there might be a deterministic "best placement" we could compute (assuming we need exactly one llvm.lifetime.start per alloca).


That best placement would be the least (common) dominator of all mentions of a MIR local.
Even indirect accesses require a direct borrow beforehand, so this should cover everything.

(Assuming that, given CFG points x, y, z, "x is a common dominator of y and z" means "x dominates both y and z", i.e. "to reach either y or z you must go through x first", and the "least" such x is the one not dominating other common dominators of y and z, i.e. it's "the closest to y and z")

This could be:

  • just before the single assignment of that local
    • let x = x + y;
  • at the end of a block branching into paths which all assign that local
    • let x = if c { a } else { b };
    • let x; if c { x = a; } else { x = b; } (roughly equivalent)

I am not sure about interactions with loops, though.

But this doesn't have to remain theoretical, we could compute this "ideal StorageLive position" and then compare it with the existing one (presumably one would dominate the other? not sure this would catch any loop issues though).


StorageDead could also be similar ("least (common) post-dominator"?).

However, it also has the effect of invalidating borrows, so we would need to keep an InvalidateBorrows(x) statement around, and consider it one of the mentions of x.

Then "Storage{Live,Dead} range shrinking" would simply boil down to hoisting InvalidateBorrows(x) up past statements which couldn't indirectly access x.


cc @nikomatsakis @ecstatic-morse @rust-lang/wg-mir-opt

@jonas-schievink jonas-schievink added A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 28, 2020
@oli-obk
Copy link
Contributor

oli-obk commented Jan 29, 2020

cc @rust-lang/wg-mir-opt (yay we have a github team now)

wrt borrow invalidation, would this be a statement emitted by borrowck?

@eddyb
Copy link
Member Author

eddyb commented Jan 29, 2020

wrt borrow invalidation, would this be a statement emitted by borrowck?

No, lifetime-based reasoning is an easy way to break unsafe code.

It would be emitted where StorageDead is emitted today (based on scope), and could only be hoisted (by some optimization) up past statements we can prove don't interact with the local (either trivially if no indirection, or by alias analysis otherwise).

@eddyb
Copy link
Member Author

eddyb commented Feb 3, 2020

Regarding loops, I'm thinking it might be hard to tell apart variables declared outside the loop vs inside the loop body, if they are only accessed inside.

I think the problem is that domination might be flawed for reasoning about the entire lifecycle of a local when it might be live across a backedge.

I wonder if we can rely on the requirement of initialization, i.e. no part of a local can be accessed inside the body without being initialized before the first iteration (making cycles in the MIR CFG more structural than it would be for e.g. C).

So a local only accessed inside a loop body shouldn't actually be able to live across the backedge without relying on references, and in that case the placement of InvalidateBorrows (or StorageDead, today) can be relied upon.

We would just need to be careful to not consider CFG points inside a loop body to be able to dominate something after the loop, e.g. in:

loop {
    x = 0;
    if cond { break; }
}
f(x);

the block containing the statement x = 0; shouldn't dominate (for our purposes here) the one containing f(x);, as the former can "execute more times" than the former.

I think we can still use a dominator tree, we would just construct it differently around loops.
(What I want to avoid is some sort of dataflow pass)

I wonder if there is a formal notion of domination that takes cycles into account in the same way.

cc @sunfishcode (I remember there is a similar distinction for SSA, where sometimes you don't want values created inside a loop body to be used directly outside of the loop, but rather you want to pass them through BB args or similar)

@nikomatsakis
Copy link
Contributor

StorageDead could also be similar ("least (common) post-dominator"?).

This sounds suspicious to me. At minimum, you would I think want to take loop carried dependencies into account, no?

i.e.,

x = 0;
loop {
    x += 1;
    if x > 100 { break; }
}
println(...);

here the postdominator of all uses of x is going to be the use in x>100, but since x is still live around the loop, the "storage dead" surely has to go outside of loop.

The other concern of course is the usual one of not knowing when there might be unsafe code trying to access that memory, which means we can't get an accurate set of all accesses. (Still, I'm only mildly sympathetic on this point, it seems like we want to be able to assume that unsafe code doesn't go too crazy here repopulating values that are moved etc, since that seems unusual.)

@eddyb
Copy link
Member Author

eddyb commented Feb 4, 2020

At minimum, you would I think want to take loop carried dependencies into account, no?

Loops are my main concern as well, see #68622 (comment) above.

The other concern of course is the usual one of not knowing when there might be unsafe code trying to access that memory, which means we can't get an accurate set of all accesses.

Even if we remove StorageDead(x), we'd have an InvalidateBorrows(x) in the same spot, so we would not emit the llvm.lifetime.end(x) call before the end of the lexical scope today.

I would also not hoist such a InvalidateBorrows(x) statement up past any statement directly mentioning x nor any statement which could perform indirect accesses.

(I have no intent of breaking any unsafe code which works today, only to make the placement a deterministic computation)

@nikomatsakis
Copy link
Contributor

@eddyb What is the deference between StorageDead and InvalidateBorrows? (It is also worth noting that NLL borrowck relies on StorageDead to emit errors for code that returns borrows to things owned by current stack frame.)

@eddyb
Copy link
Member Author

eddyb commented Feb 10, 2020

@nikomatsakis Storage{Live,Dead} form a "CFG range" (to the extent that ranges of graphs make sense), and using statements for them is a hack IMO.

OTOH, InvalidateBorrows can just be a plain statement, meaning there can be multiple of them for every local, and the StorageDead location would take all of them into account.

Although there's a subtle implication there that having a borrow that's not guaranteed to reach a InvalidateBorrows before a function returns, is UB in some sense.
(Or maybe what's UB is using the borrow in parts of the CFG reachable from InvalidateBorrows, even if the InvalidateBorrows didn't "execute". Which kind of still makes it a non-statement)
And in compile-time MIR, it would mean the local is leaked into 'static.

@nikomatsakis
Copy link
Contributor

I guess from the perspective of the borrow checker, we treat StorageDead as if it were a kind of "deallocate" statement. We don't really care about the "CFG range", which is more LLVM's problem. In other words, I think the borrow checker is compatible with your view point, though as you point out there is some sort of implication (independent from borrow checking, but required if you want to allocate overlapping stack ranges and the like) that the uses of a borrow are post-dominated by the set of StorageDead blocks or whatever.

@tmandry
Copy link
Member

tmandry commented Mar 22, 2020

Related to #61849

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Sep 20, 2020
Implement a generic Destination Propagation optimization on MIR

This takes the work that was originally started by @eddyb in rust-lang#47954, and then explored by me in rust-lang#71003, and implements it in a general (ie. not limited to acyclic CFGs) and dataflow-driven way (so that no additional infrastructure in rustc is needed).

The pass is configured to run at `mir-opt-level=2` and higher only. To enable it by default, some followup work on it is still needed:
* Performance needs to be evaluated. I did some light optimization work and tested against `tuple-stress`, which caused trouble in my last attempt, but didn't go much in depth here.
  * We can also enable the pass only at `opt-level=2` and higher, if it is too slow to run in debug mode, but fine when optimizations run anyways.
* Debuginfo needs to be fixed after locals are merged. I did not look into what is required for this.
* Live ranges of locals (aka `StorageLive` and `StorageDead`) are currently deleted. We either need to decide that this is fine, or if not, merge the variable's live ranges (or remove these statements entirely – rust-lang#68622).

Some benchmarks of the pass were done in rust-lang#72635.
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 20, 2020
Implement a generic Destination Propagation optimization on MIR

This takes the work that was originally started by `@eddyb` in rust-lang#47954, and then explored by me in rust-lang#71003, and implements it in a general (ie. not limited to acyclic CFGs) and dataflow-driven way (so that no additional infrastructure in rustc is needed).

The pass is configured to run at `mir-opt-level=2` and higher only. To enable it by default, some followup work on it is still needed:
* Performance needs to be evaluated. I did some light optimization work and tested against `tuple-stress`, which caused trouble in my last attempt, but didn't go much in depth here.
  * We can also enable the pass only at `opt-level=2` and higher, if it is too slow to run in debug mode, but fine when optimizations run anyways.
* Debuginfo needs to be fixed after locals are merged. I did not look into what is required for this.
* Live ranges of locals (aka `StorageLive` and `StorageDead`) are currently deleted. We either need to decide that this is fine, or if not, merge the variable's live ranges (or remove these statements entirely – rust-lang#68622).

Some benchmarks of the pass were done in rust-lang#72635.
@oli-obk
Copy link
Contributor

oli-obk commented Nov 14, 2020

So... as an minimal first step, we could remove just StorageLive and rename StorageDead to InvalidateBorrows.cc @simonvandel this is almost the same analysis as the one you wrote in #78928.

@RalfJung
Copy link
Member

InvalidateBorrows sounds rather high-level, when this also invalidates raw pointers and really has little to do with the borrow checker. So I'd vote for something more like ReallocateStorage or so -- something that describes what this actually does.

@tmandry
Copy link
Member

tmandry commented Nov 20, 2020

Just to bikeshed a bit more.. agreed with what @RalfJung said about InvalidateBorrows. That said, ReallocateStorage sounds to me like it's doing a bit more than it actually is. Our backend is free to reallocate that storage to something else, but doesn't necessarily have to. Also it brings to mind something like realloc, which it definitely is not.

How about we take the best of both worlds.. InvalidateStorage!

@RalfJung
Copy link
Member

RalfJung commented Nov 20, 2020

Our backend is free to reallocate that storage to something else, but doesn't necessarily have to. Also it brings to mind something like realloc, which it definitely is not.

The similarity with realloc is exactly why I proposed this name. This instruction is equivalent to realloc with the same size. Why do you say this is not something like realloc?

EDIT: Oh wait, it is equivalent to realloc and forgetting the contents of the allocation. Is that what you mean?

@tmandry
Copy link
Member

tmandry commented Nov 20, 2020

Hm, I don't see the similarity to realloc at all. To me it is more of a deallocation, possibly an idempotent one based on discussion above. realloc with the same size looks like a no-op to me.

@RalfJung
Copy link
Member

realloc with the same size is certainly not a NOP, it invalidates old pointers. This is UB:

int *x = malloc(4);
int *y = realloc(x, 4);
*x = 0;

@tmandry
Copy link
Member

tmandry commented Nov 21, 2020

I see what you mean now, thanks for explaining. When I see realloc I think of the utility most programmers use it for (resizing an allocation), which is why I find that terminology confusing. Personally I still prefer InvalidateStorage because of the potential for confusion.

@RalfJung
Copy link
Member

I agree realloc is confusing because realloc preserves the content of the allocation, which this instruction we are naming does not.

InvalidateStorage seems fine.

@oli-obk
Copy link
Contributor

oli-obk commented Nov 22, 2020

We don't have to forget the contents of the allocation. The behaviour is equivalent to a realloc, just cheaper. This means that the optimizer may throw it out entirely if it can see that the "new" allocation is unused.

That said, in MIR, there are currently no cases that I know of where a local gets re-used after its StorageDead, except for the same local in a loop.

So we could make it be realloc and preserve the contents, but that feels like a different discussion.

I'm on board with starting out with InvalidateStorage

@RalfJung
Copy link
Member

This means that the optimizer may throw it out entirely if it can see that the "new" allocation is unused.

The optimizer can throw it out entirely even if the spec says that the contents of the allocations are reset. After all, they are reset to Uninit, so leaving in the old content is fine.

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 9, 2020
Also generate `StorageDead` in constants

r? `@eddyb`

None of this special casing is actually necessary since we started promoting within constants and statics.

We may want to keep some of it around out of perf reasons, but it's not required for user visible behaviour

somewhat related: rust-lang#68622
@tmiasko
Copy link
Contributor

tmiasko commented Jun 30, 2021

If live ranges were to be implicitly defined by uses of a local, constructing such a representation would potentially shrink the live ranges relatively to those based on scope. Similarly, any transformation that removes a use of a local could also shrink the live range.

Shrinking the live range is not generally valid transformation and can be easily observed, if for example locals where previously live at the same time are no longer so.

How would you propose to address this without introducing some kind of marker at the start of a live range that mentions a local?

EDIT: A concrete examples of two cases mentioned above:

fn main() {
    let a;
    let x;
    let y;

    {
        let b = 1;
        x = &b as *const _ as usize;
    }

    a = 2;
    y = &a as *const _ as usize;

    assert!(!(x == y));
}

If lifetime is based on a scope, then a and b are live at the same time and x == y is always false. On the other hand if lifetime where to be based on a position that dominates all uses, then the live ranges are disjoint and x == y could be true. This could be resolved by changing the definition when variable is live.

fn main() {
    let mut a = 0;
    let x;
    let y;

    {
        let b = 1;
        x = &b as *const _ as usize;
    }

    a = 2;
    y = &a as *const _ as usize;

    assert!(!(x == y));
}

A variant of above, where a is initialized immediately. Variables a and b are live at the same time and again x == y is always false. Now consider dead store elimination of a = 0. The transformation would be incorrect because it makes live ranges of a and b disjoint and introduces execution where x == y might be false. As far as I see, MIR optimizations would have to take this aspect into consideration, which would make them generally non-trivial to write.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 5, 2021

address (in)equality is not something we can or do guarantee, LLVM will already happily trigger the assertion in your first example (or not) depending on random optimization choices. We only need to guarantee that the two variables are distinct, if they have borrows or uses that are problematic. If you took the address of a before the scope of b, then you could not overlap the memory of the two variables, as you could access a via a pointer while b is live. But in your two examples, it is completely fine to remove the zero assignment or to move it below the block that declares b.

Taking the address of a would require any liveness algorithm to declare the variable live while b is live.

@tmiasko
Copy link
Contributor

tmiasko commented Jul 5, 2021

The variables that are alive at the same time are guaranteed to have disjoint storage. If they are not zero-sized they must have different addresses. There is only one possible outcome of the comparison in that case.

To give another example using only optimizations we already perform, consider removal of unreachable block following if false in the following code. It would incorrect to remove it, since it would shrink the live range of a to the point where it is disjoint from live range of b:

fn main() {
    let mut a;
    let x;
    let mut y;

    if false {
        a = 1;
        y = &a as *const _ as usize;
    }

    {
        let b = 0;
        x = &b as *const _ as usize;
    }

    a = 2;
    y = &a as *const _ as usize;

    assert!(!(x == y));
}

@RalfJung
Copy link
Member

RalfJung commented Jul 5, 2021

The variables that are alive at the same time are guaranteed to have disjoint storage. If they are not zero-sized they must have different addresses.

👍

To give another example using only optimizations we already perform, consider removal of unreachable block following if false in the following code. It would incorrect to remove it, since it would shrink the live range of a to the point where it is disjoint from live range of b

Yeah, it is those kinds of examples why I tried to talk LLVM people out of their lifetime.start/lifetime.end intrinsics... without success. :/

For Rust, we get to define when each allocation lifetime starts or ends. We can define that a only gets actually allocated the first time it is written to. Then, arguably, the lifetimes of a and b could already be disjoint in the original program.

IOW, I don't think it necessarily makes sense to use a static analysis to define the live ranges of our local variables, I think it would be better to use some dynamic property. "The local becomes live when it is first written to" is easy; it is much harder to say when it stops being live...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants