add copy-on-write mutable borrowing to Rc and Arc #9786

bill-myers · 2013-10-09T22:04:59Z

This patch adds a copy-on-write mechanism to Rc and Arc of Clonable types that allows to mutably borrow the contents.

In the simplest case, if the reference count is 1, we can just give out a mutable reference since we are obviously the only user.

If the reference count is more than 1, we clone the contents, and we are back to the previous case.

With this change, Rc and Arc can be used as a variant of unique pointers with the advantage of O(1) time and space copy (as long as the contents are not mutated) but the disadvantage of requiring an extra machine branch and a bunch of machine code on every mutation.

This can be used for things like persistent functional data structures and in general data that starts being the same as parent data, but sometimes gets changed.

thestinger · 2013-10-09T22:32:26Z

Something to note is that this breaks the contract of the Freeze kind and therefore defeats the simple mechanism used to prevent the creation of cycles without using the unchecked constructor. It looks memory safe, although I haven't really considered what would happen if you created cycles of Rc with nested managed pointers.

brson · 2013-10-09T22:36:08Z

Interesting!

These semantics are pretty confusing to me for methods called get_mut and borrow_mut. From the names I would not expect them to clone and mutate their contents before giving me a pointer. These seem like pretty specialized operations so I might want very explicit names.

The discrepancy between naming here is also troubling. Are the names different because Rc and Arc are using different conventions?

I'd like to hear others' opinions about how this fits into our smart pointer design.

bill-myers · 2013-10-10T00:30:07Z

I don't think this allows to create cycles.

The reason is that when you try to create a cycle you need an &mut to the body of an Rc box to do the assignment to an Rc-typed field, which means that the Rc box must have reference count of 1 and you called borrow_mut on an Rc pointing to it, which means that that Rc is mutably borrowed, which means that you need an &mut to the body the Rc box cointaining it, and so on.

Since the whole ownership chain (which is well-defined since all Rc boxes have reference count 1) is mutably borrowed and has a reference count of 1, you can't have an Rc value that can be written to make a cycle.

I guess the naming should indeed be different, maybe using "_cow" instead of "_mut"; regarding the difference in naming between Rc and Arc, I just followed the preexisting names, and I think that it would indeed make sense to rename Rc's "borrow" to "get" (or viceversa).

thestinger · 2013-10-10T00:32:20Z

That reasoning seems correct, so I retract what I said earlier about cycles.

bill-myers · 2013-10-10T12:57:14Z

Changed to first rename the existing Rc::borrow to Rc::get to be shorter and match Arc in the first commit, and then use "cow" instead of "get_mut" as the name for the new functions, choosing the shortest sensible name.

bill-myers · 2013-10-10T13:15:58Z

Now separated the code into two functions, so we ask the compiler to inline the fast path (which needs to be inlined for speed), but not the slow path (which requires allocation and thus should not be inlined for code size).

alexcrichton · 2013-10-10T13:23:59Z

Hm, specifically about the inlining, I wouldn't worry about it at all. LLVM probably knows better how to inline it than we do, and because it's a type-generic method it'll get "inlined" into all other crates that use it (so llvm will inline it properly if it decides to do so). I'm also not a fan of the unsafe functions popping up (it looked so nice before!)

We may also want to keep considering names for a bit. If I see a function called cow I think "moo" instead of copy-on-write first. @brson has a good point that this should play well into our theoretical "smart pointer" design, but I'm thinking that even if we did have smart pointers that you'd want to be explicit about possibly cloning large structures.

bill-myers · 2013-10-10T13:33:50Z

It's quite important though to always inline the fast path, since if the Rc pointer is used as a variant of the unique pointer, it should be as fast as possible since it should be as close as possible to &mut borrowing on an unique pointer, which is a no-op.

It's also quite important to not inline the slow path, since it can cause code size to blow up and performance doesn't matter since we allocate and clone anyway which are much slower than a function call.

I'm however not sure whether we should use #[inline(always)] or #[inline(never)] rather than just #[inline] and nothing though (or a combination).

The slow path function needs to be private and unsafe because memory gets leaked if it called when the reference count is 1 (since it would drop it to 0 and not free the object).

Not sure about naming, I decided to choose the shortest meaningful one since it seemed the only objective criteria possible.

BTW, we should put a branch annotation on the conditional like GCC's likely/unlikely = __builtin_expect, but it seems Rust doesn't have that feature yet.

alexcrichton · 2013-10-10T13:48:36Z

We've generally been directly shying away from using likely/unlikely, I don't personally know the concrete reasons as to why, but the general idea is that they're often misused and rarely actually have a positive effect on generated code.

We've also been very much shying away from inline(always) and inline(never). Whenever you flag one of them as such, you're often just defeating what LLVM would do normally. LLVM is normally very smart about deciding what to inline. For the case of inline(never), what if the container could be very cheaply cloned? For the use case of inline(always), that often just forces code bloat into the caller when what if it's already massive and wouldn't benefit anyway?

Generally if we do use these attributes for performance reasons like this, we try to get some numbers behind the decision justifying why the annotations are as such. Along those lines, do you have some code one way or another that's slow without annotations and fast with the annotations? That would be a strong signal that we should probably have them, but without that we'd like to leave as much as possible up to LLVM.

bill-myers · 2013-10-10T14:36:27Z

Removing both the inline(always) and inline(never) seems to run the risk that LLVM decides to inline the slow path in the general function (which is a locally correct decision, since it's the only caller) and then not inline the general function (which is now also a correct decision after inlining the slow path in).

The slow path requires allocation of a unique box and reference count manipulation, so it never really trivializes, and the inline(never) is only very slightly harmful in code size in case the function is called once.

On the other hand, maybe we don't want to inline the fast path if we really want to minimize code size.

So, I replaced the inline(always) with inline and kept the inline(never).

bill-myers · 2013-10-10T14:47:31Z

Regarding naming, possible options, in approximate word number and then length order:

cow
own
excl
unshare
exclusivize
get_cow
get_mut
get_excl
get_clone
get_unique
make_unique
make_owned
make_exclusive
maybe_clone
get_mut_cow
get_mut_excl
get_mut_clone
get_mut_unique
clone_if_shared
unshare_and_get_mut
exclusivize_and_get_mut
make_unique_and_get_mut
make_owned_and_get_mut
make_exclusive_and_get_mut
maybe_clone_and_get_mut
clone_if_shared_and_get_mut

EDIT: also "cow_mut"

thestinger · 2013-10-10T15:56:51Z

Even though cycles aren't going to be an issue, I'm still a bit worried about breaking the Freeze invariants. If it doesn't mean either inherited mutability or immutable, we're going to need to change the documentation at the very least.

I like this functionality but I'm not sure if it's something we should be committing to supporting in the standard library, at least without unsafe. It might be better to make another reference-counted pointer that's marked as non-Freeze to do this.

bill-myers · 2013-10-10T16:44:48Z

I don't think it breaks the Freeze invariants, since you need an &mut to the Rc to call cow() (because it takes &mut self), which is just like any other struct that supports mutation.

Also, it's equivalent to creating a new rc wrapping a clone of the contents, except that the clone is optimized out if the reference count is 1.

Why do you think it breaks the freeze invariants or is unsafe?

bill-myers · 2013-10-10T16:46:46Z

BTW, added a new commit that adds 3 new functions, generalizing the approach.

The try_get_mut() function behaves like cow(), but doesn't require Clone and does nothing if cloning would be necessary (this is implemented like UnsafeArc::try_unwrap, by using an Either).

The value() and try_unwrap() functions are analogous to cow() and try_get_mut(), except that they take by-value self and return by-value contents.

Specifically, the value() function will unwrap the Rc, destroying the box, if the reference count is 1, and will otherwise decrement the reference count and return a clone of the contents.

try_get_mut and try_unwrap functions are non-redundant, since they cannot be implemented externally as they fundamentally require unsafe code and access to private members of Rc.

cow() and value() could be implemented on top of try_get_mut and try_unwrap, but this would risk possibly suboptimal code and increased compilation time (due to the need to optimize out the Either), and anyway the are the interface users would commonly use.

thestinger · 2013-10-10T16:55:09Z

Okay, I'm convinced it's fine from a safety point-of-view. :)

bill-myers · 2013-10-10T17:28:51Z

Fixed missing unsafe blocks.

thestinger · 2013-10-12T01:29:31Z

@bill-myers: needs a rebase now that the Rc type has been moved to libstd

Shorter and same name used by Arc

The try_ functions don't require Clone, but they do nothing if rc > 1 The try_unwrap and value take and return by-value instead of &mut

They aren't needed, since arcs without the kinds can't be created with safe code, and unsafe code should be allowed to do whatever it wants. This greatly simplifies libraries using Arc<T> who no longer need to impose those traits unless they construct arcs.

emberian · 2013-10-16T04:52:11Z

@thestinger this has been rebased.

@bill-myers in the future, @ mention people who were looking at your PR. GitHub gives no notification that a PR has been updated.

alexcrichton · 2013-12-02T18:07:44Z

Closing due to inactivity, but please reopen if you have an update!

michaelwoerister · 2013-12-10T15:05:00Z

@bill-myers Will you have have time to rebase this? If not, I might be able to get it up-to-date within the next days. I'd be very interested to see this functionality in master :)

bill-myers mentioned this pull request Oct 12, 2013

RFC: add snapshottable/persistent treemap variant #9816

Closed

bill-myers added 4 commits October 12, 2013 04:33

rc: rename borrow to get

d3bcc92

Shorter and same name used by Arc

rc, arc: add copy-on-write mutable borrowing to Rc and Arc

59b080b

rc, arc: add support for by-value copy-on-write and try-to-write

a0d7bc0

The try_ functions don't require Clone, but they do nothing if rc > 1 The try_unwrap and value take and return by-value instead of &mut

alexcrichton closed this Dec 2, 2013

bill-myers mentioned this pull request Jan 8, 2014

libextra: Add a copy-on-write arc container. #11230

Merged

add copy-on-write mutable borrowing to Rc and Arc #9786

add copy-on-write mutable borrowing to Rc and Arc #9786

Uh oh!

Conversation

bill-myers commented Oct 9, 2013

Uh oh!

thestinger commented Oct 9, 2013

Uh oh!

brson commented Oct 9, 2013

Uh oh!

bill-myers commented Oct 10, 2013

Uh oh!

thestinger commented Oct 10, 2013

Uh oh!

bill-myers commented Oct 10, 2013

Uh oh!

bill-myers commented Oct 10, 2013

Uh oh!

alexcrichton commented Oct 10, 2013

Uh oh!

bill-myers commented Oct 10, 2013

Uh oh!

alexcrichton commented Oct 10, 2013

Uh oh!

bill-myers commented Oct 10, 2013

Uh oh!

bill-myers commented Oct 10, 2013

Uh oh!

thestinger commented Oct 10, 2013

Uh oh!

bill-myers commented Oct 10, 2013

Uh oh!

bill-myers commented Oct 10, 2013

Uh oh!

thestinger commented Oct 10, 2013

Uh oh!

bill-myers commented Oct 10, 2013

Uh oh!

thestinger commented Oct 12, 2013

Uh oh!

emberian commented Oct 16, 2013

Uh oh!

alexcrichton commented Dec 2, 2013

Uh oh!

michaelwoerister commented Dec 10, 2013

Uh oh!

Uh oh!