Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: core::mem::replace_with for temporarily moving out of ownership. #1736

Closed
wants to merge 2 commits into from

Conversation

ticki
Copy link
Contributor

@ticki ticki commented Sep 1, 2016

This common pattern deserves a place in the standard library.

Rendered Implementation

This common pattern deserves a place in the standard library.
@ticki
Copy link
Contributor Author

ticki commented Sep 1, 2016

cc. @Sgeo @makoConstruct

@BurntSushi BurntSushi added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Sep 1, 2016
# Detailed design
[design]: #detailed-design

A complete implementation can be found [here](https://github.com/rust-lang/rust/pull/36186). This implementation is pretty simple, with only one trick which is crucial to safety: handling unwinding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just include the code here? The implementation is short, and including makes the RFC self-contained.

@strega-nil
Copy link

strega-nil commented Sep 1, 2016

So, from what I'm reading, panicking inside of the closure results in implementation defined behavior? That seems... bad. I'd rather just define it now to, well, abort is my preference.

@ticki
Copy link
Contributor Author

ticki commented Sep 1, 2016

@ubsan Undefined != Unspecified. Check the comments from the code.

Unwinding will make it abort, but the behavior is unspecified, and might thus change in the future (just like the dropping order of a vector's elements and arithmetic overflows).

Edit: @usban edited his comment. Let me just reply to the new edition of it:

Well, the point is to keep all possibilities open, and frankly, there might be even more in the future. One thing we could define is that it mirrors the behavior of destructors in destructors.

@Ixrec
Copy link
Contributor

Ixrec commented Sep 2, 2016

Speaking of keeping all possibilities open, is there any reason this couldn't be implemented to simply mem::forget the old value to avoid double-drop without aborting or catching the panic? I'm not sure that's what I'd prefer, but it's what I intuitively expected before I saw the abort() in your PR.

@ticki
Copy link
Contributor Author

ticki commented Sep 2, 2016

@lxrec forgetting would require true ownership of the inner value or control of the drop flags, neither of which are plausible.

@strega-nil
Copy link

strega-nil commented Sep 2, 2016

@ticki 1) not a male. 2) I am pretty sure I wrote "implementation defined behavior" in my original comment. I don't like implementation defined behavior. And goddamnit, check the gender of the people you're talking about. This is getting ridiculous.

When you say "unspecified", that means "undefined behavior", by the way. A compiler could do anything, unless you actually give it options of what to do. What, exactly, in your RFC is stopping a compiler from writing to null in the case of a panic?

@ticki
Copy link
Contributor Author

ticki commented Sep 2, 2016

@ubsan

  1. I'm very sorry. I shouldn't assume that (damn gendered language. and it's even worse because English isn't my first language).

  2. Not really. The undefined would mean that the compiler could do anything inconsistently or consistenly. Unspecified means that it is not defined, but it is consistent.

What, exactly, in your RFC is stopping a compiler from writing to null in the case of a panic?

The fact that it is safe denotes that it is not breaking invariants (and thus leading to UB).

@strega-nil
Copy link

@ticki I've seen a lot of broken safe code :3

My opinion is, obviously, that we should define it. However, I think if we don't, we should give specific options to take, because that is what unspecified is to me. An implementation chooses one of a specific set of options, and doesn't have to document it (unlike implementation defined, where an implementation should document it).

@ticki
Copy link
Contributor Author

ticki commented Sep 3, 2016

I've seen a lot of broken safe code :3

Well, safe code (while it could be broken) can always be assumed to work right. Safe code shouldn't break any invariants.

My opinion is, obviously, that we should define it. However, I think if we don't, we should give specific options to take, because that is what unspecified is to me. An implementation chooses one of a specific set of options, and doesn't have to document it (unlike implementation defined, where an implementation should document it).

There are already a lot of unspecified things in standard library. Vec drop order comes to my mind. Overflows is a case too.

@strega-nil
Copy link

@ticki Overflows are defined to be two's complement for signed integers, and reset to zero for unsigned integers OR panic.

Vec drop order (as well as all drop orders, except for scope) are unspecified, but they are given (implicit) options to choose from; forwards or backwards.

@ticki
Copy link
Contributor Author

ticki commented Sep 3, 2016

@ubsan

Overflows are defined to be two's complement for signed integers, and reset to zero for unsigned integers OR panic.

Please link. RFC 560 says otherwise.

Vec drop order (as well as all drop orders, except for scope) are unspecified, but they are given (implicit) options to choose from; forwards or backwards.

Well, not really. In theory, any order could be used. In the end, it is unspecified, and having panics unspecified here is neither unsafe nor dangerous. The type system still expresses an important invariant, i.e. that panicking is safe. Whether it aborts, overwrites the value, or anything else is certainly important as well, but the price paid for specifying it is stagnation.

@ticki
Copy link
Contributor Author

ticki commented Sep 3, 2016

And, again, unspecified behavior is completely unrelated to undefined behavior.

@kennytm
Copy link
Member

kennytm commented Sep 3, 2016

C++17 §1.3.26[defns.unspecified]

unspecified behavior

behavior, for a well-formed program construct and correct data, that depends on the implementation.

[Note: The implementation is not required to document which behavior occurs. The range of possible behaviors is usually delineated by this International Standard. — end note ]

Could the RFC just specify some options to choose from (i.e. abort = alternative 1 / catch_unwind = alternative 2 / eat your laundry / etc) and move on? This undefined vs unspecified discussion is really going off-topic.

@ticki
Copy link
Contributor Author

ticki commented Sep 3, 2016

Eating your laundry is definitely a guarantee. Haven't you read the Rust website?

@durka
Copy link
Contributor

durka commented Sep 3, 2016

@ticki the version of RFC 560 you linked is a pre-acceptance draft. The accepted version is here and doesn't leave things unspecified.

Anyway I would agree that a safe function being documented to do something unspecified does not imply that it could do something unsound. In this case I haven't seen any suggestions as to what the code could do besides aborting, so it might stop the argument if we just defined that (I would like to leave the option open to print something if std is available, as I commented on the PR). If not, we could as a compromise adjust it to say "behavior on panic is unspecified, but broken invariants will not be exposed to safe code."

[motivation]: #motivation

A common pattern, especially for low-level programmers, is to acquire ownership
of a value behind a reference without immediately giving it a replacement value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a simple example of this pattern and the problem it causes?

@nikomatsakis
Copy link
Contributor

@ticki

Well, the point is to keep all possibilities open, and frankly, there might be even more in the future. One thing we could define is that it mirrors the behavior of destructors in destructors.

This is probably obvious, but it's early in the morning and I've only had 2 cups of coffee (clearly not enough). Can you say more about the connection to the behavior of unwinding in a dtor? (Do you mean specifically double panic? Or just any dtor in a dtor?)

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Sep 13, 2016

I'm generally 👍 on offering this API, I agree it is a very useful primitive. I am wary of the fact that it can abort, which I think is unacceptable for some applications -- though this is currently a bigger problem in Rust, of course, given the "panic during unwinding" behavior (not to mention OOM behavior, stack overflow behavior, and a few other such things). But I can imagine many cases, particularly unsafe code, where it's evident that everything will be fine from inspection.

EDIT: It might be a good idea to be careful with the documentation to appropriately convey the caution that is required here =)

@ticki
Copy link
Contributor Author

ticki commented Sep 13, 2016

@nikomatsakis

This is probably obvious, but it's early in the morning and I've only had 2 cups of coffee (clearly not enough). Can you say more about the connection to the behavior of unwinding in a dtor? (Do you mean specifically double panic? Or just any dtor in a dtor?)

It's a typo. I meant "panic in destructors".

@nikomatsakis
Copy link
Contributor

@ticki

It's a typo. I meant "panic in destructors".

Let me rephrase. Is there some particular semantics for panicing in destructors (or during unwinding, or both) that would make it possible for you to do something other than abort here?

@plietar
Copy link

plietar commented Sep 15, 2016

Could this be specialised such that if T: Default, then it gets replaced by the default value on unwind instead of aborting ?
While this goes against the principle that specialisation shouldn't affect externally visible behaviour, it would avoid aborting when possible. Aborting should be the last resort

@ticki
Copy link
Contributor Author

ticki commented Sep 15, 2016

@plietar The point is exactly that there is no substituded value. Indeed, you can use simply mem::replace if you want to replace it.

@bluss
Copy link
Member

bluss commented Sep 28, 2016

  1. I would be wary of blessing a function that can turn a &mut T into T. Doesn't this extend what you can legally do within the type system? Could I until now presume that if I give a user access to a &mut Token, that they can never come back with an owned Token value, when I gave no method to create one?
  2. The signature of the function must be in the RFC.

@cramertj
Copy link
Member

This came up again on reddit

This is a really common thing for beginners to reach for unsafe to solve, and it's really easy to write something that is subtly wrong. IMO it should be in the standard library.

@Ekleog
Copy link

Ekleog commented Feb 17, 2021

As the reason for the FCP was that having an aborting function was too big a footgun for the standard library, here is an idea that I think has not been suggested yet.

Maybe it would make sense to introduce this function as an unsafe function? It is technically safe, but I think unsafe could be used here to mark the size of the footgun, thus allowing for this function to enter the standard library — and then hopefully even if it is unsafe people would still use it instead of trying to write one of their own.

The drawback of this idea being that when using the closure in-place it is automatically unsafe, which isn't great… but it's probably still better than having everyone re-implement it on their own?

@Lokathor
Copy link
Contributor

An unsafe version in the standard library is really not much gain at all compared to someone just using a crate for this.

@RustyYato
Copy link

It is technically safe, but I think unsafe could be use

No, it's not fine to mark a safe function as unsafe.

@RustyYato
Copy link

I think this should be introduced into the std, even with the abort, even if it's just to document the pattern and pitfalls. People are going to try this out themselves if it's not in std (because take_mut isn't very well-known, unless you know about it already). This will invariably lead to buggy unsafe code. By codifying replace_with in std we avoid this issue, and put warning signs and alternatives front and center.

@ashpil
Copy link

ashpil commented Mar 14, 2022

I have a use case that is (slightly) different and more straightforward than others mentioned here, so I figured I'd put it here.

I have a function that takes ownership of an Expr, modifies it in some way, then returns that Expr (fn convert(e: Expr) -> Expr). I prefer this to passing a mutable reference to it. However, most of my Exprs are stored in a Box, and once I do something with that Expr, I want to be put it back where it was stored -- that is, back into a Box.

To do this, as far as I'm aware, what I currently have to do is this:

let e: Box<Expr> = ...; // I get my Box<Expr> from somewhere
let converted: Box<Expr> = Box::new(convert(*expr)); // deref, then create new heap allocation

which seems quite inefficient, as I'm taking something out of the box, only to map it to some other value, and put it back into a new Box. If Expr is large (which it is), I'm deallocating some large chunk of memory (the old box), only to allocate a chunk of the same size and type right after. It seems it would be much more preferable to just run my convert function on the heap-allocated data and overwrite that same data with the result.

This RFC would let me do so very easily:

let mut e: Box<Expr> = ...; // I get my Box<Expr> from somewhere
let converted: Box<Expr> = std::mem::replace_with(&mut e, convert); // no reallocation

@RalfJung
Copy link
Member

@ashpil you can safely mutate the box in-place like this:

struct Expr {}

fn convert(e: Expr) -> Expr { e }

fn work_on_box(mut e: Box<Expr>) {
    *e = convert(*e);
}

@ijackson
Copy link

My library partial-borrow relies for soundness on the fact that the Rust language doesn't allow you to get an owned T given only &mut T.

Rust + partial_borrow + (replace_with | take_mut) is unsound, even though Rust + partial_borrow is sound. IOW providing this facility in the Rust standard libary would be a breaking change.

@RalfJung
Copy link
Member

RalfJung commented Nov 11, 2022

FWIW, I consider https://crates.io/crates/take_mut to be sound, and therefore consider your crate to be unsound. (Haven't looked at your code, this is based on your statement about the assumptions your library makes.)

Providing this facility in the Rust standard library would help clarify what the safety requirements of mutable references are. We have to make a choice one way or the other, as an ecosystem -- it doesn't help to punt on this, that just risks making crate composition unsound, which is not a desirable end state.

Adding take_mut to the stdlib would not be a breaking change -- it does not break any promise that has been stably made in the past.

@RalfJung
Copy link
Member

Is there an explanation somewhere of why partial-borrow + take_mut is unsound?

The semantic model of RustBelt proves soundness of take_mut, so I wonder where your crate does something that conflicts with RustBelt.

@vi
Copy link

vi commented Nov 11, 2022

@ijackson Is it sound in presence of mem::swap or mem::replace?

@vi
Copy link

vi commented Nov 11, 2022

Is officially deeming third-party crates like replace_with or partial_borrow or e.g. rental sound or unsound in purview of UGC/opsem workgroup?

@RalfJung
Copy link
Member

It's not really an op.sem question, this is a question of the exact contract of the type system (the "safety invariant" of mutable references). We don't have a team explicitly chartered with that (we're not there yet), but I'd imagine it would involve UCG and t-types.

FWIW the opinion I stated above was my own, not necessarily reflecting any team consensus.

@ijackson
Copy link

@RalfJung:

Is there an explanation somewhere of why partial-borrow + take_mut is unsound?

partial-borrow hands out &mut Partial<T> where Partial is a ZST, maintaining the invariant that every &mut Partial<T> has the same address as an original non-ZST T. This invariant can be maintained in Rust-without-take_mut, because having &mut Partial gives you no way to produce an owned Partial so you can't ever make a different &mut Partial which doesn't satisfy the address invariant.

Using &'l mut Partial<T> rather than PartialMutRef<'l, T> makes for good ergonomics, because it gains access to the borrow checker's special case for &mut reborrowing; and because it gains access to the compiler's special treatment of field projection.

The semantic model of RustBelt proves soundness of take_mut, so I wonder where your crate does something that conflicts with RustBelt.

I went to look at RustBelt, and skimread the technical appendix; I don't have the time needed to grapple with this in a formal way. But, inferring from what you say, and my reading didn't contradict this, I don't think RustBelt encodes the language limitation that take_mut removes; and that is a limitation that is critical for the soundness of partial-borrow in its current form. So I expect that it is not possible to construct within RustBelt a soundness proof for partial-borrow.

Whether one regards that as an unsoundness in partial-borrow depends on what one thinks the rules for Rust are. I think partial-borrow is sound in Rust-with-only-the-standard-library. I don't have a formal proof, but I do have an informal correctness argument.

We have to make a choice one way or the other, as an ecosystem -- it doesn't help to punt on this, that just risks making crate composition unsound, which is not a desirable end state.

Yes, quite so. Franklly, I won't be surprised if this is resolved in favour of take_mut/replace-with and against partial-borrow.

I think, though, that there may be other libraries that rely on ZST references in a way that relies on the Rust programmer's inability to obtain T given &mut T. I'm not sure how we would find them.

@vi:

Is it sound in presence of mem::swap or mem::replace?

Yes. You can mem::swap two identicially-typed Partial, since you can have their &mut. But because they're ZSTs, it is a no-op. (It is undesirable that this compiles, since it would be a mistake by the programmer, but it's not unsound, just semantically wrong.)

Is officially deeming third-party crates like replace_with or partial_borrow or e.g. rental sound or unsound in purview of UGC/opsem workgroup?

Surely. At least UCG/opsem must be able to do that by implication, via the semantic rules; and making explicit statements about particular crates seems more practically helpful, and also more socially appropriate: after all, if UCG/opsem are declaring a crate unsound by implication it would be much better to explicitly acknowledge that.

@RalfJung
Copy link
Member

RalfJung commented Nov 11, 2022

It is definitely interesting to know that take_mut even is part of one of these "semantic conflicts". This is not something I expected, so thanks for bringing this up.

partial-borrow hands out &mut Partial where Partial is a ZST, maintaining the invariant that every &mut Partial has the same address as an original non-ZST T. This invariant can be maintained in Rust-without-take_mut, because having &mut Partial gives you no way to produce an owned Partial so you can't ever make a different &mut Partial which doesn't satisfy the address invariant.

If you care about the address where some data lives, then the standard library has a way of expressing that -- it's pinning. I know it is not very ergonomic, but mutable references are (to me) explicitly designed to reflect movable data. The "natural" way to define a safety invariant for mutable references does allow take_mut, and I am honestly not even sure how to state an invariant that would allow your code instead. Would it require special-casing ZST in the invariant? (For hopefully obvious reasons, we generally try to avoid any ad-hoc special cases.)

Using &'l mut Partial rather than PartialMutRef<'l, T> makes for good ergonomics, because it gains access to the borrow checker's special case for &mut reborrowing; and because it gains access to the compiler's special treatment of field projection.

So if, hypothetically, we had support for field projections on user-defined types (something that does come up fairly regularly), then this usecase would disappear?

@cuviper
Copy link
Member

cuviper commented Nov 11, 2022

I think if your Partial<T> held an extern type (#1861) as a marker, that would have the semantics that you want -- especially that being unsized would block it from take_mut entirely. If you need it on stable now, you can also approximate that with [()], as long as you can stomach your reference being a wide pointer.

@ijackson
Copy link

@RalfJung:

It is definitely interesting to know that take_mut even is part of one of these "semantic conflicts". This is not something I expected, so thanks for bringing this up.

You're welcome. Thanks for engaging :-).

partial-borrow hands out &mut Partial where Partial is a ZST, maintaining the invariant that every &mut Partial has the same address as an original non-ZST T. This invariant can be maintained in Rust-without-take_mut, because having &mut Partial gives you no way to produce an owned Partial so you can't ever make a different &mut Partial which doesn't satisfy the address invariant.

If you care about the address where some data lives, then the standard library has a way of expressing that -- it's pinning.

But that's not really what's going on. It's not that I care where the data lives. (The original T that the &mut Partial<T> came from is effectively immoveable, indeed, but only because it's been mutably borrowed by the original call split(&mut T) -> (&mut Partial<T>, ...).)

Really, I'm using &mut T as a way of carrying an address between different bits of my unsafe code. A newtype around a *mut T would be much better except for the ergonomics of reborrowing and field projection.

Or to put it another way, what I'm doing is a stunt.

I am honestly not even sure how to state an invariant that would allow your code instead. Would it require special-casing ZST in the invariant? (For hopefully obvious reasons, we generally try to avoid any ad-hoc special cases.)

Quite.

I think stunts like this are only feasible with ZSTs. The putative other libraries that I am imagining may also do something similar, would also have to use ZSTs.

Blocking take_mut for ZSTs would be very strange. I don't think we even have a way to express that in the type system right now.

Using &'l mut Partial rather than PartialMutRef<'l, T> makes for good ergonomics, because it gains access to the borrow checker's special case for &mut reborrowing; and because it gains access to the compiler's special treatment of field projection.

So if, hypothetically, we had support for field projections on user-defined types (something that does come up fairly regularly), then this usecase would disappear?

tl;dr: Probably, yes, and the result would be good.

There are two things that &mut gives me that I can't simulate. One is field projections; the alternative is to provide methods.

Even with this stunt, the field projections aren't perfect. With current Rust, I have to arrange for my field ZSTs, post-projection, to DerefMut to the actual field. So assignment to a field is *partial.field = value. I imagine a facility for custom field projection would allow writing partial.field = value, with the compiler automatically dereferencing the reference return value from the custom field projection method. So that would be nice.

The other is that an &mut T can be "lent", almost as if it were Copy:

let mut x = String::new();
let y = &mut x;
y.push('.'); // <- calls String::push(y), but doesn't move y, despite y: !Copy
println!("{}", y); // <- y still here, not moved, even though was passed to push

This doesn't work if you wrap the reference up in a newtype. There doesn't seem to be a way to tell the compiler that a newtype is "borrowably Copy" and get access to the same borrowck feature. Pin suffers from the same problem, and provides as_mut(), calls to which one finds sprinkled throughout certain flavours of async Rust.

For partial-borrow, having access to this reborrowing machinery is nice because it means one can define methods on the partial pointer whichcan be called normally (without needing some kind of as_mut() adapter) but which receive the actual pointer as argument, rather than a pointer to it.

But in practice this is less useful than it seems because one must usually "downgrade" a partial borrow to a subset, since the method often needs strictly less access, so a call to .as_mut() or something is needed anyway.

Another advantage of the newtype around *mut T is that it can have proper provenance and partial-borrow wouldn't need to int/ptr casts.

@cuviper:

I think if your Partial<T> held an extern type (#1861) as a marker, that would have the semantics that you want -- especially that being unsized would block it from take_mut entirely. If you need it on stable now, you can also approximate that with [()], as long as you can stomach your reference being a wide pointer.

That seems like an approach which would work, indeed. It could also be appropriate for people who are using ZST address marker types for other applications.

@ijackson
Copy link

That seems like an approach which would work, indeed. It could also be appropriate for people who are using ZST address marker types for other applications.

Oh, I just realised. That wouldn't work for partial-borrow because my Partial ZST has public ZST fields, all of which would also have to be marked this way, but obviously an unsized field could only be the last one in the overall struct.

@RalfJung
Copy link
Member

RalfJung commented Nov 14, 2022

But that's not really what's going on. It's not that I care where the data lives. (The original T that the &mut Partial came from is effectively immoveable, indeed, but only because it's been mutably borrowed by the original call split(&mut T) -> (&mut Partial, ...).)

You said yourself that the relevant invariant is that "every &mut Partial has the same address as an original non-ZST T". That's why, when someone moves the Partial to somewhere else via take_mut, the invariant is broken. This means you care about the address where the Partial lives -- exactly what pinning provides.

Literally the only assumption take_mut needs is that all Rust types are movable, and given an &mut T we can move them anywhere else (temporarily). If your code is unsound when combined with take_mut, that means it is not movable, which means it cares about the address, which means you are in the realm of what pinning was designed for.

@vi
Copy link

vi commented Nov 14, 2022

Is it sound in presence of mem::swap ... ?

Yes. You can mem::swap two identicially-typed Partial, since you can have their &mut. But because they're ZSTs, it is a no-op.

Does mem::swap interact in any way with pointer provenance?
What if it affects what is sound and what is unsound even if it per se does not compile to any CPU instructions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.