Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recycle storage after move #61849

Open
tmandry opened this issue Jun 14, 2019 · 17 comments
Open

Recycle storage after move #61849

tmandry opened this issue Jun 14, 2019 · 17 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@tmandry
Copy link
Member

tmandry commented Jun 14, 2019

We should experiment with "re-allocating" storage for a local after it's moved from if it gets re-initialized. This would enable more optimizations, but could have some potential fallout.

EDIT: See this comment for further explanation of what kinds of optimizations this would enable.

Quoth @RalfJung, from #59123 (comment):

Currently, the following (entirely safe) code will definitely return true:

let mut x = String::new();
let addr_x = &x as *const _ as usize;
drop(x);
// later
x = String::new();
let addr_x2 = &x as *const _ as usize;
return addr_x == addr_x2;

If we want to do optimizations like yours here (and I am totally sympathetic to that), we have to explain in the "Rust Abstract Machine" (and in Miri) why this program might return false. And the answer cannot be "there is UB", because this is safe code.

This is a topic that @nikomatsakis, @eddyb and me have touched on several times already, without ever hashing out a full plan. But in the current state of affairs, the only mechanism we have to "defeat" pointer equality tests like the above is to make sure that this is not the same allocation any more.

So, one thing we might do is to do StorageDead(x); StorageLive(x); immediately after every move. This "re-allocates" x and thus definitely kills any existing pointers and also "defeats" pointer comparisons. The immediate StorageLive is to keep the liveness state in sync in both branches of a conditional (which might or might not be relevant -- unfortunately LLVM's semantics for these intrinsics is less than clear). I guess the StorageLive could be moved down in cases where there is no merging control flow, which should give you your optimization in many cases.

It is possible to do a subset of the optimization discussed in #59123 without this, but this would help cover more cases, in this optimization and others.

cc @cramertj @eddyb @nikomatsakis @RalfJung

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jun 14, 2019
@eddyb
Copy link
Member

eddyb commented Jun 18, 2019

There's further discussion in #59123 (comment), I wish we could just fork the thread.

@RalfJung
Copy link
Member

@eddyb writes

No, I was referring to the case where we want the semantics of Operand::Move to be that they invalidate any outstanding borrows, similar to Storage{Dead,Live} but without bloating the MIR/impacting analyses which rely on some sort of dominance relationship.

I don't understand your position now. Are you saying you don't mind if source-level moves invalidate borrows, you just don't want it be be encoded into Operand in MIR, but rather something more like StorageDead? That would make sense, I just kept thinking you were worried about source-level moves.

Yes, I'd find it rather strange to have this encoded in the Operand. It would also be non-uniform as Operand does not have to be a local. I could live with encoding basically the same thing in something more like StorageDead. That's explicit and easy to give semantics.

Perhaps surprisingly, I have much less strong opinions about surface language semantics than about MIR semantics.^^ We'd still want to make sure of course to make that translation not too weird -- magic UB-inducing MIR instructions like StorageDead or Retag should be inserted in a predictable way.

@eddyb
Copy link
Member

eddyb commented Jun 19, 2019

Right, the confusing thing was that all along I've assumed you were against unsafe code not being able to reinitialize indirectly (i.e. through a raw pointer) some local that had been moved out of.

I consider the MIR more flexible and even its "place-oriented non-SSA CFG" nature to be subject to change, whereas the surface-level semantics to be the important thing to discuss, UB-wise.

In the short term, to keep things simple, I wouldn't mind a StorageRefresh (name up for bikeshed) after each Operand::Move, that has the effects of a Storage{Dead,Live} pair but doesn't risk breaking certain kinds of analysis (which can just ignore it if they want to remain conservative).

Long-term, if we go in this direction, I would expect us to be shrinking storage ranges, although I'm not sure what to do about the simple notion that a StorageLive dominates all uses - it's almost as if "storage-live" is becoming very close to (or even replaceable with) "(partially) initialized".

@RalfJung
Copy link
Member

Right, the confusing thing was that all along I've assumed you were against unsafe code not being able to reinitialize indirectly (i.e. through a raw pointer) some local that had been moved out of.

Oh I see. No I am not necessarily opposed to forbid unsafe code from doing that. I just think the way we forbid it should be as clean and principled as we can make it.

I consider the MIR more flexible and even its "place-oriented non-SSA CFG" nature to be subject to change, whereas the surface-level semantics to be the important thing to discuss, UB-wise.

That's fair. On the other hand, I think surface language semantics of a language as complex as Rust is best explained by elaboration into some simpler language. And I think an idealized version of MIR is a reasonable candidate for the target language of this elaboration.

Manishearth added a commit to Manishearth/rust that referenced this issue Jul 2, 2019
…, r=matthewjasper

Don't store locals that have been moved from in generators

This avoids reserving storage in generators for locals that are moved
out of (and not re-initialized) prior to yield points. Fixes rust-lang#59123.

This adds a new dataflow analysis, `RequiresStorage`, to determine whether the storage of a local can be destroyed without being observed by the program. The rules are:

1. StorageLive(x) => mark x live
2. StorageDead(x) => mark x dead
3. If a local is moved from, _and has never had its address taken_, mark it dead
4. If (any part of) a local is initialized, mark it live'

This is used to determine whether to save a local in the generator object at all, as well as which locals can be overlapped in the generator layout.

Here's the size in bytes of all testcases included in the change, before and after the change:

async fn test    |Size before |Size after
-----------------|------------|----------
single           | 1028       | 1028
single_with_noop | 2056       | 1032
joined           | 5132       | 3084
joined_with_noop | 8208       | 3084

generator test              |Size before |Size after
----------------------------|------------|----------
move_before_yield           | 1028       | 1028
move_before_yield_with_noop | 2056       | 1032
overlap_move_points         | 3080       | 2056

## Future work

Note that there is a possible extension to this optimization, which modifies rule 3 to read: "If a local is moved from, _**and either has never had its address taken, or is Freeze and has never been mutably borrowed**_, mark it dead." This was discussed at length in rust-lang#59123 and then rust-lang#61849. Because this would cause some behavior to be UB which was not UB before, it's a step that needs to be taken carefully.

A more immediate priority for me is inlining `std::mem::size_of_val(&x)` so it becomes apparent that the address of `x` is not taken. This way, using `size_of_val` to look at the size of your inner futures does not affect the size of your outer future.

cc @cramertj @eddyb @Matthias247 @nikomatsakis @RalfJung @Zoxc
bors added a commit that referenced this issue Jul 2, 2019
…jasper

Don't store locals that have been moved from in generators

This avoids reserving storage in generators for locals that are moved
out of (and not re-initialized) prior to yield points. Fixes #59123.

This adds a new dataflow analysis, `RequiresStorage`, to determine whether the storage of a local can be destroyed without being observed by the program. The rules are:

1. StorageLive(x) => mark x live
2. StorageDead(x) => mark x dead
3. If a local is moved from, _and has never had its address taken_, mark it dead
4. If (any part of) a local is initialized, mark it live'

This is used to determine whether to save a local in the generator object at all, as well as which locals can be overlapped in the generator layout.

Here's the size in bytes of all testcases included in the change, before and after the change:

async fn test    |Size before |Size after
-----------------|------------|----------
single           | 1028       | 1028
single_with_noop | 2056       | 1032
joined           | 5132       | 3084
joined_with_noop | 8208       | 3084

generator test              |Size before |Size after
----------------------------|------------|----------
move_before_yield           | 1028       | 1028
move_before_yield_with_noop | 2056       | 1032
overlap_move_points         | 3080       | 2056

## Future work

Note that there is a possible extension to this optimization, which modifies rule 3 to read: "If a local is moved from, _**and either has never had its address taken, or is Freeze and has never been mutably borrowed**_, mark it dead." This was discussed at length in #59123 and then #61849. Because this would cause some behavior to be UB which was not UB before, it's a step that needs to be taken carefully.

A more immediate priority for me is inlining `std::mem::size_of_val(&x)` so it becomes apparent that the address of `x` is not taken. This way, using `size_of_val` to look at the size of your inner futures does not affect the size of your outer future.

cc @cramertj @eddyb @Matthias247 @nikomatsakis @RalfJung @Zoxc
@tmandry
Copy link
Member Author

tmandry commented Jul 3, 2019

To further elaborate on the kinds of optimizations we could do with this..

If a local x is moved from, but

  • Has only had its address taken using &x
  • Is Freeze (does not contain interior mutability)

Then we could optimize away its storage after it's moved from. Rust will enforce that any borrow is dead after the move, of course, and transmuting *const to *mut is already considered UB. So in order to do this type of optimization soundly, we must declare that

  • accessing the old address of x after it is moved is UB
  • the address of x may change after it is re-initialized

@RalfJung
Copy link
Member

RalfJung commented Jul 3, 2019

  1. accessing the old address of x after it is moved is UB
  2. the address of x may change after it is re-initialized

(1) follows from (2). But my concern is that we should specify (2) in an operational way -- that is, we can't just say "the address may change", we have to describe the operational behavior that causes it to change. Something like StorageDead+StorageLive. Something that can be implemented in Miri or similar checkers.

@tmandry
Copy link
Member Author

tmandry commented Jul 3, 2019

Something like StorageDead+StorageLive. Something that can be implemented in Miri or similar checkers.

Yes, agreed.

In theory, this should also help us answer the question of "who are we breaking with this change?" (Ideally no one, but probably someone.) In practice, though, it doesn't seem feasible to run miri on crater because of slowness.

Maybe if we could produce a smaller list of crates likely to break. All crates that use unsafe, for instance. But,

  1. that list might still be too long
  2. I'm guessing many breakages would actually manifest in tests of downstream crates which depend on the crates we're breaking

@tmandry
Copy link
Member Author

tmandry commented Sep 15, 2019

I'd like to propose adding something like StorageRefresh(x) which @eddyb suggested earlier inside generators, ideally before async/await stabilizes (but soon after is probably good enough). That way, if we do decide to make this optimization for generators later, we won't be breaking any code that was considered valid by miri at the time it was written.

StorageRefresh would have no effect on codegen, so LLVM couldn't rely on it. Nor will any optimization passes in rustc rely on it at first (this will probably change).

We can also go back and add optimizations for non-generator code, but that's a step we would want to be more careful about. First we would add these StorageRefresh statements to all Rust code (not just generators) and see if people report any unexpected miri validation errors.

This would create a nice "ratcheting" of semantics by helping us keep our options open for generators today, and putting out a canary for non-generator code in the future.

The only problem I see with it is, there are certain unsafe patterns (which should be rare in practice, if they exist at all) which will pass validation outside of generators but fail inside of them. We'd like to discourage any of these patterns anyway, so I don't see this as a big problem.

@rust-lang/wg-async-await

@RalfJung
Copy link
Member

@tmandry the semantics would be the same as StorageDead(x); StorageLive(x);, right? It's just compressed for performance reasons? That should be clearly documented as such.

The only problem I see with it is, there are certain unsafe patterns (which should be rare in practice, if they exist at all) which will pass validation outside of generators but fail inside of them. We'd like to discourage any of these patterns anyway, so I don't see this as a big problem.

So you plan to only insert this for generators? That's not great, but I can understand why.
Can we have a -Z flag (that Miri will set) to insert them everywhere? We already have -Zmir-emit-retag, this would be similar in spirit. That way, code compiled for / run in Miri would still get the extra checking.

@tmandry
Copy link
Member Author

tmandry commented Sep 16, 2019

@tmandry the semantics would be the same as StorageDead(x); StorageLive(x);, right? It's just compressed for performance reasons? That should be clearly documented as such.

Sorry this was unclear. The main reason to introduce a new kind of statement it is so that it can be ignored in codegen, and any MIR passes which wish to be conservative. Then over time we can start using it in more passes. Specifically I don't want LLVM relying on this yet, since it could break unsafe code in the wild.

Performance is a secondary benefit.

@RalfJung
Copy link
Member

RalfJung commented Sep 17, 2019

it could break unsafe code in the wild.

What kind of code would be broken by this? A concrete example would be nice, ideally with some indication of whether anything like this has been seen in the wild.
Or, to ask basically the same question, where do you want to emit those instructions?

@tmandry
Copy link
Member Author

tmandry commented Sep 17, 2019

What kind of code would be broken by this?

Any code which does something like this:

let x = String::from("hello");
let x_ptr = &mut x as *mut String;
std::mem::drop(x);
unsafe { *x_ptr = String::from("goodbye"); }

I suppose reading the bytes would be too, but I think that's already UB.

I don't know of any code which does this in the wild, but it may well exist. I don't know of an easy way to check for it today.

Or, to ask basically the same question, where do you want to emit those instructions?

After a local is moved from. Today, just in generators (both to be conservative, and because adding it elsewhere is likely to regress performance).

This is basically your suggestion to emit StorageDead(x); StorageLive(x); after every move, but in a way that lets us be more conservative.

@RalfJung
Copy link
Member

I see, makes sense.

Having Miri test for this (everywhere, not just in generators) would be a good first step towards finding at least some uses in the wild, I guess.

@nikomatsakis
Copy link
Contributor

I do think it'd be great to know whether this pattern exists in the wild, but I also feel pretty strongly that we are going to want to call it UB in the end. It seems like this comes up fairly regularly when discussing possible optimizations.

In any case, I'm in favor of moving forward with steps towards allowing us to recycle memory, but I don't really think this has to be done before/after async-await stabilizes, as I think this kind of code falls pretty squarely into the "grey area" of unsafe code that could well become invalid.

Do we know of even one real instance of such code?

@RalfJung
Copy link
Member

In case the question was directed at me, I am totally fine making that code UB. I'd just hope we can have it fail in Miri before we do optimizations based on it. ;)

@RalfJung
Copy link
Member

In terms of recycling storage, I learned some strange things about LLVM:

Eli Friedman writes

If you're talking about the semantics of LLVM, according to the LangRef, two allocas can't have the same address. The rules for alloca specify that the memory is not freed until the function returns. And the rules for llvm.lifetime.start/end don't say anything about the address, only that accessing the memory is undefined outside the lifetime. Therefore, it's illegal for stack coloring to allocate two variables whose address escapes at the same address.

That would mean that even with StorageEnd, if the address if a local gets observed, LLVM would not be allowed to put another local at the same address -- which seems like a problem for us because we do want LLVM to reuse those addresses.

@tmiasko
Copy link
Contributor

tmiasko commented Jul 3, 2021

Code generation of StorageRefresh as a pair of lifetime.end and lifetime.start in LLVM makes it possible to detect use after move with AddressSanitizer and MemorySanitizer. That would be quite useful to canvas the usage in the wild and detect bugs.

AddressSanitizer detecting use after move

#[derive(Debug)]
struct Wrapper<T: ?Sized>(T);

fn main () {
    let mut w = Wrapper("A");
    let p: *const _ = &w as *const _;
    println!("{:?}", unsafe { &*p});
    drop(w);
    println!("{:?}", unsafe { &*p});
}
     Running `target/x86_64-unknown-linux-gnu/debug/use-after-move`
Wrapper("A")
=================================================================
==184053==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffe619d05c0 at pc 0x563fd541589f bp 0x7ffe619ce700 sp 0x7ffe619ce6f8
READ of size 8 at 0x7ffe619d05c0 thread T0
    #0 0x563fd541589e in <&str as core::fmt::Debug>::fmt library/core/src/fmt/mod.rs:2033:71
    ...
    #13 0x563fd4f65f9d in use_after_move::main src/main.rs:9:5
    ...

Address 0x7ffe619d05c0 is located in stack of thread T0 at offset 384 in frame
    #0 0x563fd4f65a0f in use_after_move::main src/main.rs:4

  This frame has 9 object(s):
    [32, 40) '_34' (line 9)
    [64, 72) '_32' (line 9)
    [96, 112) '_31' (line 9)
    [128, 176) '_24' (line 9)
    [208, 216) '_16' (line 7)
    [240, 248) '_14' (line 7)
    [272, 288) '_13' (line 7)
    [304, 352) '_6' (line 7)
    [384, 400) 'w' (line 5) <== Memory access at offset 384 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope library/core/src/fmt/mod.rs:2033:71 in <&str as core::fmt::Debug>::fmt

Potential utility for LLVM stack coloring to reduce stack usage is a different question. As far as I can see, in the current implementation of stack coloring, such a pair of intrinsics doesn't shorten a live range. In fact, it could do the opposite.

If an alloca is used with more than a single pair of lifetime intrinsics, the alloca is considered to be "degenerate" and processed in more conservative fashion. The effective live range of "degenerate" alloca starts with lifetime.start, while for non-"degenerate" allocas it is shortened towards the first use (which itself seems hard to explain operationally, but that's for another discussion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants