-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ManuallyDrop
type gives precise control of dtors of inline data.
#197
Closed
Closed
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
e22591c
`ManuallyDrop` type gives precise control of dtors of inline data.
huonw 2879766
Correct speeling.
huonw 7b02f80
Update to reflect Rust changes.
huonw 5c3884b
Merge branch 'master' into manually-dropped
huonw 386bf27
Address @bluss's notes.
huonw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,264 @@ | ||
- Feature Name: manually_drop | ||
- Start Date: 2014-06-27 | ||
- RFC PR #: (leave this empty) | ||
- Rust Issue #: (leave this empty) | ||
|
||
# Summary | ||
|
||
Provide a `ManualDrop<T>` type that stores a `T` inline in memory, but | ||
doesn't run `T`s destructor by default. | ||
|
||
# Motivation | ||
|
||
There is currently no long-term way to have partially (un)initialised | ||
inline data in types, since internal fields always have their own | ||
destructor run. This is effectively generalising the behaviour of | ||
`*const` and `*mut` (which do not run destructors on their contents | ||
automatically) to inline data, i.e. not requiring allocation or other | ||
indirection like the pointer types. | ||
|
||
The motivating example of this is a `SmallVec<T>` type | ||
(e.g. servo's [smallvec], or @bluss's [arrayvec]), where vector instances | ||
with a small number of elements are stored directly inline, not on the | ||
heap. Something like | ||
|
||
[smallvec]: https://github.com/servo/rust-smallvec | ||
[arrayvec]: https://github.com/bluss/arrayvec | ||
|
||
```rust | ||
struct SmallVec<T> { | ||
length: uint, | ||
capacity: uint, | ||
pointer: *mut T, | ||
inline: [T, .. 8] | ||
} | ||
``` | ||
|
||
As an example of its behaviour consider: | ||
|
||
```rust | ||
let mut v = SmallVec::new(); | ||
// v.length == v.capacity == 0 | ||
// v.pointer == NULL | ||
// v.inline = [uninit, uninit, ... ] | ||
|
||
|
||
v.push("foo".to_string()); | ||
// v.length == 1, v.capacity == 0 (or something) | ||
// v.pointer == NULL | ||
// v.inline = ["foo", uninit, ... ] | ||
|
||
for _ in range(0, 99) { | ||
v.push("bar".to_string()); | ||
} | ||
// v.length == 100, v.capacity >= 100 | ||
// v.pointer == some allocation containing "foo", "bar", "bar", ... | ||
// v.inline = [uninit, uninit, ... ] | ||
``` | ||
|
||
When a `SmallVec` with the above definition is dropped, the | ||
destructors of all 8 `T`s in `inline` will always be run (no matter | ||
what happens in the `Drop` implementation of `SmallVec`), leading to | ||
dramatic unsafety: in the first and last cases, destructors would be | ||
running on uninitialised data, and in the second, it would correctly | ||
destroy the first element (`"foo"`) but the last 7 are uninitialised. | ||
|
||
With the `ManuallyDrop` type given in this RFC, `SmallVec` could be | ||
defined as | ||
|
||
```rust | ||
// (sketch, without `unsafe {}`s/casts etc. for clarity) | ||
|
||
struct SmallVec<T> { | ||
length: uint, | ||
capacity: uint, | ||
pointer: *mut T, | ||
|
||
// now these 8 T's are not automatically destroyed when a SmallVec | ||
// goes out of scope. | ||
inline: ManuallyDrop<[T, .. 8]> | ||
} | ||
|
||
impl<T> Drop for SmallVec<T> { | ||
fn drop(&mut self) { | ||
if !self.pointer.is_null() { // heap allocated | ||
// same as Vec<T>'s drop | ||
for i in range(0, self.length) { | ||
ptr::read(self.pointer.offset(i)); | ||
} | ||
alloc::heap::deallocate(self.pointer, ...); | ||
} else { | ||
// run destructors on the first `self.length` elements of | ||
// `self.inline`, but no others. | ||
for i in range(0, self.length) { | ||
ptr::read(&self.inline.get().offset(i)); | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
This definition is now safe: destructors run on exactly those elements | ||
that are valid and no others. | ||
|
||
|
||
To be precise, at the time of writing, Rust defines that destructors | ||
are safe to run on destroyed values (and guarantees it via the so called | ||
"drop flag"), meaning this can currently be made safe via: | ||
|
||
```rust | ||
impl<T> Drop for SmallVec<T> { | ||
fn drop(&mut self) { | ||
// same as above. | ||
|
||
// flag the whole array as dropped: it's fine for the `inner` | ||
// field to be redestroyed now. | ||
ptr::write_bytes(&mut self.inner as *mut T, mem::POST_DROP_U8, self.inner.len()); | ||
} | ||
} | ||
``` | ||
|
||
However, the drop flag is likely to disappear: [#5016]. | ||
|
||
[#5016]: https://github.com/mozilla/rust/issues/5016 | ||
|
||
NB. not running destructors is now safe in Rust, so this type isn't | ||
inherently `unsafe`, except for the fact that it is designed to store | ||
invalid instances of other types. | ||
|
||
# Detailed design | ||
|
||
A new lang-item type equivalent to | ||
|
||
```rust | ||
pub struct ManuallyDrop<T> { | ||
data: T | ||
} | ||
``` | ||
|
||
would be provided, and the compiler would know to *not* run the | ||
destructor of the `T` when a `ManuallyDrop<T>` instance goes out of | ||
scope. That is, | ||
|
||
```rust | ||
struct X { num: int }; | ||
impl Drop for X { | ||
fn drop(&mut self) { println!("{}", self.num) } | ||
} | ||
|
||
fn main() { | ||
let normal = X { x: 0 }; | ||
let new = ManuallyDrop { data: X { x: 1 } }; | ||
} | ||
``` | ||
|
||
would print only `0`, *not* `1`. | ||
|
||
This would be modelled after the `UnsafeCell` type, providing similar | ||
methods: | ||
|
||
- `const fn new(x: T) -> ManuallyDrop<T>` | ||
- `fn get(&self) -> *const T` | ||
- `fn get_mut(&mut self) -> *mut T` | ||
- `unsafe fn into_inner(self) -> T` | ||
|
||
Additional initialisation functions can be made available: | ||
|
||
- `fn zeroed()` | ||
- `fn uninitialized()` | ||
|
||
(These would be `const fn` if possible, but it is not clear to me that | ||
it is right now.) | ||
|
||
The `ManuallyDrop` type would be a black box for representation | ||
optimisations: it is explicitly designed to be able to store arbitrary | ||
junk, and so the assumptions made by the compiler conventionally may | ||
not hold (for example, the `Option` null pointer optimisation [will | ||
break] if a pointer is null). As a concrete example: | ||
|
||
[will break]: https://github.com/servo/rust-smallvec/issues/5 | ||
|
||
```rust | ||
type T = Option<Box<u8>>; | ||
type T_MD = Option<ManuallyDrop<Box<u8>>>; | ||
|
||
assert_eq!(size_of::<T>(), 8); | ||
assert_eq!(size_of::<T_MD>(), 16); | ||
``` | ||
|
||
# Alternatives | ||
|
||
- This may be adequately modeled by enums in many situations, but I | ||
can't imagine it will be easy to wrangle `enum`s to make `SmallVec` | ||
work nicely (one variant for each possible on-stack arity? How do | ||
you avoid adding an (unnecessary) additional work for the | ||
discriminant?), and especially not if we ever get generic number | ||
parameters, so something like `struct SmallVec<T, n> { ... inner: ManuallyDrop<[T, .. n]> }` | ||
would work, allowing for `SmallVec<T, 2>`, `SmallVec<T, 100>` etc. | ||
|
||
- A struct `UninterpretedBytesOfSize<T>` equal to | ||
`[u8, .. size_of::<T>()]`, that is, a chunk of memory large enough | ||
to store a `T`, but treated as raw memory (i.e. `u8`s). This has the | ||
(large) downside of losing all type information, interfering with | ||
the compiler's reachability analysis (e.g. for `UnsafeCell`), and | ||
making it easier for the programmer to make mistakes w.r.t. an | ||
incorrect or forgotten coercion (it's would be identical to C's | ||
`void*`). This is getting more feasible with [`const fn`] | ||
support. There needs to be support for explicitly specifying the | ||
alignment of the type too (to match `T`). | ||
|
||
- Change drop to have (semantically) take full ownership of its | ||
contents, so that `mem::forget` works, e.g. `trait Drop { fn | ||
drop(self); }` or a design like that in [@eddyb's comment]. | ||
|
||
- Make no change and just perform manual ownership control via | ||
`Option<T>` (or something similar, designed to avoid invalid layout | ||
optimisations): a data type can store `Option<T>` instead of `T` | ||
with the invariant that the `Option` is always `Some` except for | ||
when the destructor runs. This allows one to implement the previous | ||
alternative with `take`. E.g. `ManuallyDrop` can be shimmed as | ||
something like: | ||
|
||
```rust | ||
struct ManuallyDrop<T> { x: Option<T> } | ||
impl<T> ManuallyDrop<T> { | ||
const fn new(x: T) -> ManuallyDrop<T> { | ||
ManuallyDrop { x: Some(x) } | ||
} | ||
fn get(&self) -> *const T { | ||
self.x.as_ref().unwrap() | ||
} | ||
fn get_mut(&mut self) -> *mut T { | ||
self.x.as_ref_mut().unwrap() | ||
} | ||
unsafe fn into_inner(self) -> T { | ||
self.x.unwrap() | ||
} | ||
} | ||
|
||
impl Drop for ManuallyDrop { | ||
fn drop(&mut self) { mem::forget(self.x.take()) } | ||
} | ||
``` | ||
|
||
This approach is used by [arrayvec] but has many downsides: | ||
|
||
- (much) larger compiled code when not using `unsafe` (`unwrap` | ||
includes branches and unwinding etc.) | ||
- more invariants to track | ||
- larger data types | ||
|
||
[`const fn`]: https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md | ||
[@eddyb's comment]: https://github.com/rust-lang/rfcs/pull/197#issuecomment-110850383 | ||
|
||
# Unresolved questions | ||
|
||
- Should a `ManuallyDrop<T>` always be `Copy`? It no longer has a | ||
destructor and so the only risk of double freeing (etc.) would be | ||
when the user writes such a thing. This would allow it to be a | ||
maximally flexible building-block, but I cannot think of a specific | ||
use-case for `ManuallyDrop<NonCopy>` to be `Copy`. Being `Copy` | ||
would make `ManuallyDrop` entirely the inline equivalent of `*const` | ||
and `*mut`, since they are both `Copy` always. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might be useful for RawVec to avoid having to do this: unsafe {
// need to use ptr::read to unsafely move the buf out since it's
// not Copy.
let buf = ptr::read(&self.buf);
let len = self.len;
mem::forget(self);
IntoIter {
start: *buf.ptr,
end: buf.ptr.offset(len as isize),
_buf: buf,
}
} But the code is already so blatantly unsafe that it's kinda "whatever" |
||
|
||
- The name. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr::read
is unsafe.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, sorry about that...