From 06963fda377d654390d37a81c4d9bfb1728050cb Mon Sep 17 00:00:00 2001 From: Erik Jensen Date: Mon, 27 Apr 2015 13:51:18 -0700 Subject: [PATCH 1/4] Guaranteed non-static destructors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Guarantee that the destructor for data that contains a borrow is run before any code after the borrow’s lifetime is executed. --- .../0000-guaranteed-non-static-destructors.md | 180 ++++++++++++++++++ 1 file changed, 180 insertions(+) create mode 100644 text/0000-guaranteed-non-static-destructors.md diff --git a/text/0000-guaranteed-non-static-destructors.md b/text/0000-guaranteed-non-static-destructors.md new file mode 100644 index 00000000000..a6ce1251c6b --- /dev/null +++ b/text/0000-guaranteed-non-static-destructors.md @@ -0,0 +1,180 @@ +- Feature Name: Guaranteed non-static destructors +- Start Date: 2015-04-27 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +Guarantee that the destructor for data that contains a borrow is run before any +code after the borrow’s lifetime is executed. + +# Motivation + +Rust is currently not guaranteed to run destructors on an object before +execution moves beyond its bounding-lifetime. This is surprising, +unintuitive, and leads to soundness mistakes. This is evidenced by two recently +approved API designs: `thread::scopped` and `drain_range`, both of which were +found to be unsound because they mistakenly assumed that their destructor would +run before any code outside of their bounding lifetimes could be executed. Even +if both are fixed, it is very likely that similar errors will arise in the +future. + +While we can try to continuously hammer home that leaks are not considered +`unsafe`, we'll never be able to prevent all such mistakes. It would be better +if we could actually provide the intuitive guarantee, which is what this RFC +attempts to accomplish. + +Note that this RFC is explicitly not attempting to solve all leaks, nor even to +ensure that all stack-anchored objects are destructed. Rust is primarily +concerned with memory safety, so this RFC tries to formulate a general rule +that addresses the issue of incorrect assumptions about destructors leading to +memory unsafety. It does not attempt to prevent the leaking of other resources, +as leaking a `'static` object is never unsafe. (`'static` objects are +conceptually safe to keep around “forever”, so any unsoundness that can be +obtained by a leak could also be obtained through other means). + +# Detailed design + +In addition to current guarantees, the following property can be relied upon: + +Given an object `A` implementing `Drop` whose lifetime is restricted to `'a` +(that is, the object may not live longer than `'a`), in absence of unsafe code, +the destructor for `A` is guaranteed to run before any code after the end of +`'a` is executed. + +This is already true at the language level. However, certain library constructs +currently allow safe code to break it. + +This has the following implications: + + * It is perfectly acceptable to forget, leak, et cetera any object as long as + that object is valid for the duration of `'static`. Intuitively, this makes + sense, as forgetting an object is the same as having it live forever. + * A resource that is never freed due to and endless loop, a deadlock, or + program termination is valid because any code that comes after the end of + `'a` is never executed. + * Code would be allowed to rely on the guarantee for soundness. This means + that patterns such as `thread::scoped` and the initially-proposed + `Vec::drain_range` would be sound. + * Unsafe code is free to forget objects as needed in cases where the + programmer guarantees it is sound. However, it is not allowed to expose a + safe interface that would allow the above guarantee to be violated. + +As noted, this guarantee is already true at the core language level. However, +it can be violated in safe code when using the standard library. There are +two known ways in which this can happen: when a destructor panics in certain +situations (e.g., in a `Vec`), and when a reference cycle is created using `Rc` +and friends. + +This RFC proposes the following solutions: + + * Specify that any panic that occurs while a destructor is in progress results + in an abort. Panicking in a destructor is generally a bad idea with many + edge cases, so this is probably desirable, anyway. It should be possible to + implement this efficiently in a similar manner to C++’s `noexcept`. + * Restrict the basic `new` operation of existing reference-counted types to + types valid for `'static`, which are always safe to leak. Additionally, + introduce a scoped cycle collector that can be used to safely create `Rc`s + with a shorter lifetime, and research the possibility of a reference-counted + type that statically disallows cycles. + +Specifically, the scoped cycle collector would operate as follows: + + * There `RcGuard` type that can be instantiated with a given lifetime `'a`. + * Safe code can use the guard object to safely create `Rc`s with any type that + outlives `'a`. + * When the `RcGuard` is dropped, it drops the data of all `Rc's` created with + it. This would free any cycles. + * Attempting to dereference any `Rc` whose content had already been dropped + (e.g., during cycle clean-up) would cause a panic (and thus an abort if it + happened during RcGuard's destructor). + +The guard technique can also be applied to channels to ensure that they are +properly cleaned up even if the user does something like send the receiver and +sender into their own channel. + +# Drawbacks + + * Using `Rc`s for non-`'static` types will be slightly less convenient. The + user will either have to use the cycle guard, an acyclic Rc type, or use + `unsafe` and manually verify that no leaks are possible. (In the compiler, + for instance, the guard would need to be somewhere outlived by the type + arenas (perhaps in the `ctxt` object)). + * Probably not enough time to implement all of this by 1.0. This can be + mitigated by implementing the minimum necessary to make this + backward-compatible. Since the behavior of panicking destructors are already + unspecified and subject to change, this would mean restricting the safe + creation of reference-counted objects to containing `'static` values and + adding an `unsafe` way to use shorter lifetimes. This would make using `Rc`s + for non-`'static` data impossible without `unsafe` in 1.0, which is not + ideal. + +# Alternatives + + * Don't consider failing to destruct non-`'static` data unsafe. + + This is more or less the status quo, barring certain adjustments such as + removing `unsafe` from `mem::forget`. + + This would be unfortunate, as it makes a very common RAII pattern unusable + for anything memory-safety related. Furthermore, it breaks expectations. It + seems like using an RAII guard should be memory safe, to the point that two + new APIs were recently designed that relied on it for soundness, despite the + fact that leaks have not been considered unsafe for a long time. + + It is very likely that people will continue to make this mistake (even if + the core team doesn't, third party developers almost certainly will). Rust’s + strong lifetime ownership semantics make it seem like something that should + be reliable. This is compounded by the fact that it is true of the core + language, and “true-enough” in practice that developers will continue to + assume that it can be relied upon. + + Even if the programmer is aware of the limitation, taking it into account + can lead to more difficult and convoluted API design to ensure soundness. + Given that they are *usually* reliable, a programmer my opt to go with an + RAII design anyway to save time, reducing the value of Rust’s safety + guarantees. + + * Leave `Rc` and friends. Add a `Leak` trait or similar. All APIs that can + potentially leak (such as `Rc`) would have a `Leak` bound, and programmers + who want to rely on their destructor being called before their lifetime ends + would have to add a `Leak` bound to opt out of being usable with `Rc`s, + channels, et cetera. + + While `!Leak` would technically only be needed for types can that lead to + unsoundness if their destructor were skipped, many would probably use it for + any guard-like type whose destructor “should” run, preventing their use in + certain contexts unnecessarily. + + Additionally, it is likely that there will be types with *do* need the + guarantees for memory safety that otherwise would be useful to keep in an + `Rc` or send over a channel, which would not be possible. + + Instead of defining a single, simple rule for all types, the `Leak` + trait would have to be specifically dealt with all over, e.g., by adding + `+Leak` to the bounds of anything that you want to put in an `Rc`. Also, if + leaking `drain_range` or something similar is made to leak the referenced + values instead of leading to unsoundness, it needs to be `Leak` if and only + if the contained type is. This adds complexity and mental burden. + + Finally, while some of the solutions proposed in this RFC for handling + non-`'static` data could be applied to `!Leak` data, the added complexity + added by a `Leak` trait would be even less worth it if the complexity to + work around it had to be added to `Rc`, channels, et cetera, anyway. + + * Guarantee that all stack-anchored objects have their destructor run if the + program exits normally. + + This is much more challenging, would require sweeping changes in many areas, + and is not even that useful: you can always move a `'static` object off the + stack into a static location. Also, since `'static` objects can conceptually + live forever, there seems little benefit to attempting to enforce otherwise. + + In addition, there several benefits to being able to being able to safely + forget static data, such as forgetting a `File` to keep the file from being + closed once you have transferred the underlying file descriptor. + +# Unresolved questions + +What are the best designs for safely allowing channels and `Rc`s to safely work +with non-`'static` data? From 0b0c281502b2858ba77a1febdfc9469b50734208 Mon Sep 17 00:00:00 2001 From: Erik Jensen Date: Tue, 28 Apr 2015 01:06:41 -0700 Subject: [PATCH 2/4] Clarify how ScopedRc should work --- .../0000-guaranteed-non-static-destructors.md | 22 ++++++++++++------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/text/0000-guaranteed-non-static-destructors.md b/text/0000-guaranteed-non-static-destructors.md index a6ce1251c6b..d5dde9a5d2e 100644 --- a/text/0000-guaranteed-non-static-destructors.md +++ b/text/0000-guaranteed-non-static-destructors.md @@ -80,14 +80,20 @@ This RFC proposes the following solutions: Specifically, the scoped cycle collector would operate as follows: - * There `RcGuard` type that can be instantiated with a given lifetime `'a`. - * Safe code can use the guard object to safely create `Rc`s with any type that - outlives `'a`. - * When the `RcGuard` is dropped, it drops the data of all `Rc's` created with - it. This would free any cycles. - * Attempting to dereference any `Rc` whose content had already been dropped - (e.g., during cycle clean-up) would cause a panic (and thus an abort if it - happened during RcGuard's destructor). + * There is a new `ScopedRc` type that has an added lifetime parameter + specifying for how long the reference itself is valid. + * There is also a `ScopedRcGuard` type that can be instantiated with a given + lifetime `'a`. + * There is a new `ScopedRc` type that can only be constructed with a + `ScopedRcGuard`, and cannot outlive the guard. + * A `ScopedRc` created from a `ScopedRcGuard<'a>` can only contain data that + is valid for the duration of `'a`. + * When the `ScopedRcGuard` is dropped, it collects any remaining cycles. + * During cycle collection, an attempt by one object in the cycle to + dereference another that has already been collected will result in an abort. + +See [this gist](https://gist.github.com/rkjnsn/791ee9cc3b6d2961cf33) for a +working `ScopedRc` implementation, including sample usage. The guard technique can also be applied to channels to ensure that they are properly cleaned up even if the user does something like send the receiver and From c1e4cfb484085162bf41e682306038926529df4f Mon Sep 17 00:00:00 2001 From: Erik Jensen Date: Tue, 28 Apr 2015 15:30:49 -0700 Subject: [PATCH 3/4] Add unresolved question. --- text/0000-guaranteed-non-static-destructors.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/text/0000-guaranteed-non-static-destructors.md b/text/0000-guaranteed-non-static-destructors.md index d5dde9a5d2e..c6466a54f8d 100644 --- a/text/0000-guaranteed-non-static-destructors.md +++ b/text/0000-guaranteed-non-static-destructors.md @@ -184,3 +184,7 @@ sender into their own channel. What are the best designs for safely allowing channels and `Rc`s to safely work with non-`'static` data? + +Should the `ScopedRc` functionality outlined above be added as separate types, +or should we add a lifetime parameter to the existing `Rc`, `Arc`, et cetera to +encourage the same types to be used for both `'static` and non-`'static` data? From a165d35dedf3c97da5c3d493b6c9bb0d502ea0a0 Mon Sep 17 00:00:00 2001 From: Erik Jensen Date: Tue, 5 May 2015 19:19:50 -0700 Subject: [PATCH 4/4] Update RFC to reflect current protype design --- .../0000-guaranteed-non-static-destructors.md | 191 ++++++++++++------ 1 file changed, 132 insertions(+), 59 deletions(-) diff --git a/text/0000-guaranteed-non-static-destructors.md b/text/0000-guaranteed-non-static-destructors.md index c6466a54f8d..64b864c5f26 100644 --- a/text/0000-guaranteed-non-static-destructors.md +++ b/text/0000-guaranteed-non-static-destructors.md @@ -15,23 +15,26 @@ execution moves beyond its bounding-lifetime. This is surprising, unintuitive, and leads to soundness mistakes. This is evidenced by two recently approved API designs: `thread::scopped` and `drain_range`, both of which were found to be unsound because they mistakenly assumed that their destructor would -run before any code outside of their bounding lifetimes could be executed. Even -if both are fixed, it is very likely that similar errors will arise in the -future. +run before any code outside of their bounding lifetimes could be executed. +While possible solutions have been proposed for both of them, it shows how easy +the mistake is to make. It is very likely that similar mistakes will continue +to be made in the future. While we can try to continuously hammer home that leaks are not considered -`unsafe`, we'll never be able to prevent all such mistakes. It would be better -if we could actually provide the intuitive guarantee, which is what this RFC -attempts to accomplish. +`unsafe`, we’ll never be able to prevent all such mistakes, especially as Rust +grows and more third-party libraries are developed. It would be better if we +could actually provide the intuitive guarantee, which is what this RFC attempts +to accomplish. Note that this RFC is explicitly not attempting to solve all leaks, nor even to ensure that all stack-anchored objects are destructed. Rust is primarily concerned with memory safety, so this RFC tries to formulate a general rule -that addresses the issue of incorrect assumptions about destructors leading to -memory unsafety. It does not attempt to prevent the leaking of other resources, -as leaking a `'static` object is never unsafe. (`'static` objects are -conceptually safe to keep around “forever”, so any unsoundness that can be -obtained by a leak could also be obtained through other means). +to allow destructors to be relied upon for safety-related clean-up without +complicating or limiting situations without memory-safety implications. It does +not attempt to prevent the leaking of other resources, as leaking a `'static` +object is never memory-unsafe. (`'static` objects are conceptually safe to keep around “forever”, so any unsoundness that can be obtained by a leak could also +be obtained through other means, such as storing the value in a long-lived +hashmap). # Detailed design @@ -50,12 +53,12 @@ This has the following implications: * It is perfectly acceptable to forget, leak, et cetera any object as long as that object is valid for the duration of `'static`. Intuitively, this makes sense, as forgetting an object is the same as having it live forever. - * A resource that is never freed due to and endless loop, a deadlock, or + * A resource that is never freed due to an endless loop, a deadlock, or program termination is valid because any code that comes after the end of `'a` is never executed. * Code would be allowed to rely on the guarantee for soundness. This means - that patterns such as `thread::scoped` and the initially-proposed - `Vec::drain_range` would be sound. + that patterns such as `thread::scoped` `Vec::drain_range` would be sound as + initially proposed. * Unsafe code is free to forget objects as needed in cases where the programmer guarantees it is sound. However, it is not allowed to expose a safe interface that would allow the above guarantee to be violated. @@ -72,52 +75,97 @@ This RFC proposes the following solutions: in an abort. Panicking in a destructor is generally a bad idea with many edge cases, so this is probably desirable, anyway. It should be possible to implement this efficiently in a similar manner to C++’s `noexcept`. - * Restrict the basic `new` operation of existing reference-counted types to - types valid for `'static`, which are always safe to leak. Additionally, - introduce a scoped cycle collector that can be used to safely create `Rc`s - with a shorter lifetime, and research the possibility of a reference-counted - type that statically disallows cycles. - -Specifically, the scoped cycle collector would operate as follows: - - * There is a new `ScopedRc` type that has an added lifetime parameter - specifying for how long the reference itself is valid. - * There is also a `ScopedRcGuard` type that can be instantiated with a given - lifetime `'a`. - * There is a new `ScopedRc` type that can only be constructed with a - `ScopedRcGuard`, and cannot outlive the guard. - * A `ScopedRc` created from a `ScopedRcGuard<'a>` can only contain data that - is valid for the duration of `'a`. - * When the `ScopedRcGuard` is dropped, it collects any remaining cycles. - * During cycle collection, an attempt by one object in the cycle to - dereference another that has already been collected will result in an abort. + * Replace the existing `Rc` and `Arc` types with variants that this RFC will + call `ScopedRc` and `ScopedArc`. These types will have a lifetime parameter + added, and the contained `RcBox` will have an extra field to facilitate + cycle collection. `ScopedRc` is described in detail, below, and `ScopedArc` + will function analogously. + +## `ScopedRc<'a, T>` design + + * `ScopedRc` has an added invariant lifetime parameter specifying how long the + reference is valid. This may be shorter than the lifetime of `T`, but will + never be longer. + * A `ScopedRc` can either be guarded or unguarded. + * A guarded `ScopedRc` works as follows: + * There is a `ScopedRcGuard` type that can be instantiated with a given + lifetime `'a`. + * `ScopedRcGuard` can be used to create a new `ScopedRc`, which cannot + outlive the creating guard. + * Data stored in a `ScopedRc` created with a given `ScopedRcGuard` must + live at least as long as the guard. + * When the `ScopedRcGuard` is destroyed, it cleans up any cycles that may + remain among the `ScopedRc` objects created by it. + * During cycle collection, an attempt by an object in a cycle to + dereference a `ScopedRc` that has already been collected will result in + an abort. + * An unguarded `ScopedRc` does not have an associated cycle collector, and can + be created in the following ways: + * Through an `unguarded` constructor on the `ScopedRc`. This constructor + requires `T: 'static`, and would thus is considered safe to call. This + simplifies common `Rc` usages such as reference-counted strings. + * Using a global STATIC_GUARD object. This can be passed anywhere a + `&ScopedRcGuard` is expected, only accepts `'static` data, and creates + unguarded `ScopedRc`s. This allows APIs using cycles to provide a single + interface for creating both guarded and safe unguarded `ScopedRc`s. + * Using a `NoCycleMarker` type. This is used similarly to a `ScopedRcGuard` + but requires `T` to strictly outlive the marker. This makes cycles + impossible, so the created `ScopedRc`s does not need a cycle collector. + * Through an `unsafe_unguarded` constructor. This allows the creation of + unguarded `ScopedRc`s of any lifetime, but is `unsafe` because it could + be used to create uncollected, non-`'static` cycles, violating the + guarantee provided by this RFC. The caller of this method is responsible + for ensuring that such cycles are impossible, or guaranteed to be cleaned + up through some means before the end of the relevant lifetime. + * A `ScopedRcCreator` trait is also be provided, implemented by both + `ScopedRcGuard` and `NoCycleMarker`. This allows `ScopedRc` creation APIs to + be generic over either approach. See [this gist](https://gist.github.com/rkjnsn/791ee9cc3b6d2961cf33) for a -working `ScopedRc` implementation, including sample usage. - -The guard technique can also be applied to channels to ensure that they are -properly cleaned up even if the user does something like send the receiver and -sender into their own channel. +working `ScopedRc` implementation, including [sample usage](https://gist.github.com/rkjnsn/791ee9cc3b6d2961cf33#file-scopedrctest-rs). + +Other interfaces that use `Rc`s or `Arc`s internally (such as channels) have a +few options available. If they only use the `Rc`s to hold `'static` data, or +if they can be sure that no cycles are possible, they can continue to offer the +same API as today. If the objects containing `Rc`s all borrow from a main +object of some sort, the interface can internally use a `ScopedRcGuard` or a +`NoCycleMarker` depending on its needs. Finally, if the created `Rc`s hold a +user-specified type, the interface can generically accept a ScopedRcCreator` to +allow the user to statically eliminate or dynamically clean up cycles at their +discretion. # Drawbacks - * Using `Rc`s for non-`'static` types will be slightly less convenient. The - user will either have to use the cycle guard, an acyclic Rc type, or use - `unsafe` and manually verify that no leaks are possible. (In the compiler, - for instance, the guard would need to be somewhere outlived by the type - arenas (perhaps in the `ctxt` object)). - * Probably not enough time to implement all of this by 1.0. This can be - mitigated by implementing the minimum necessary to make this + * Using `ScopedRc`s for non-`'static` types will be slightly less convenient. + The user will either have to use a `ScopedRcGuard` or a `NoCycleMarker`, or + use `unsafe_unguarded` and manually verify that no leaks are possible. + * The internal `RcBox` allocation for unguarded `ScopedRc`s will be one + pointer larger than that of today’s `Rc`s. However, the author of this RFC + believes this will be insignificant in practice. Increasing the size of the + `RcBox` in a test did not noticably affect the memory usage of the Rust + compiler. + * Dereferencing a `ScopedRc` requires an extra check compared to today’s `Rc`. + Again, the author does not expect this to be significant in practice. Adding + this check to `Rc` did not noticably impact the Rust compiler’s compile + time. However, the sample implementation does incude an unsafe + `uncheck_deref` for the rare case where many different `ScopedRc`s need to + be dereferenced in a tight, performance-crital loop. + * Additional overhead for guarded `ScopedRc`s. In the sample implementation, + this overhead is three extra pointer-sized values per `RcBox` allocation + versus an unguarded `ScopedRc`, and some additional O(1) bookkeeping when + the underlying allocation is created or destroyed. (All other operations are + the same for guarded and unguarded `ScopedRc`s.) + * There may not enough time to finalize the `ScopedRc` design before 1.0. This + can be mitigated by implementing the minimum necessary to make this backward-compatible. Since the behavior of panicking destructors are already - unspecified and subject to change, this would mean restricting the safe - creation of reference-counted objects to containing `'static` values and - adding an `unsafe` way to use shorter lifetimes. This would make using `Rc`s - for non-`'static` data impossible without `unsafe` in 1.0, which is not - ideal. + unspecified and subject to change, doing so would be a matter of restricting + the safe creation of reference-counted objects to `'static` values. This + would cause the creation of non-`'static` `Rc`s require unsafe code until + `ScopedRc` could be introduced. # Alternatives - * Don't consider failing to destruct non-`'static` data unsafe. + * Don’t consider failing to destruct non-`'static` data unsafe. This is more or less the status quo, barring certain adjustments such as removing `unsafe` from `mem::forget`. @@ -129,11 +177,11 @@ sender into their own channel. fact that leaks have not been considered unsafe for a long time. It is very likely that people will continue to make this mistake (even if - the core team doesn't, third party developers almost certainly will). Rust’s + the core team doesn’t, third party developers almost certainly will). Rust’s strong lifetime ownership semantics make it seem like something that should be reliable. This is compounded by the fact that it is true of the core language, and “true-enough” in practice that developers will continue to - assume that it can be relied upon. + rely on it, even if it’s not technically safe to do so. Even if the programmer is aware of the limitation, taking it into account can lead to more difficult and convoluted API design to ensure soundness. @@ -180,11 +228,36 @@ sender into their own channel. forget static data, such as forgetting a `File` to keep the file from being closed once you have transferred the underlying file descriptor. -# Unresolved questions + * Leave `Rc` as it stands for 1.0, and spend more time testing and refining + `ScopedRc`. Deprecate `Rc` when `ScopedRc` is ready. This would allow more + time to develop `ScopedRc` without limiting `Rc` in the meantime, but would + mean postponing the guarantee set forth by this RFC until Rust 2.0. + + * A slight modification of this proposal, suggested by @pythonesque, would be + to have `ScopedRcGuard` use `dropck` to statically prevent the possibility + of dereferencing collected `ScopedRc`s. The idea would be to allow cycles if + and only if no destructors could observe them. This would eliminate the need + for a zero-check when dereferencing, potentially speeding up code, slightly. + + Unfortunately, the only way known to the author to accomplish this today + would require `ScopedRcGuard` to take a type parameter, and thus require a + distinct `ScopedRcGuard` for every type the programmer wanted to put in a + `ScopedRc`. For programs like the Rust compiler that construct `Rc`s with + many different types, this would be untenabled. Depending on usage patterns, + constructing and passing around references to tens of `ScopedRcGuards` may + cost more than it saves, performancewise. + + If it becomes possible in the future to impose `dropck` requirements where + the compiler would not normally infer them, and thus to allow this solution + to work without many different `ScopedRcGuard`s, it may be worth persuing. -What are the best designs for safely allowing channels and `Rc`s to safely work -with non-`'static` data? +# Unresolved questions -Should the `ScopedRc` functionality outlined above be added as separate types, -or should we add a lifetime parameter to the existing `Rc`, `Arc`, et cetera to -encourage the same types to be used for both `'static` and non-`'static` data? +Should we also add an `UnguardedRc` type that can be created with static data +or using `NoCycleMarker`? Like `ScopedRc`, this would have a lifetime parameter +and use the same `RcBox` format. However, it would be statically guaranteed to +be unguarded, so no zero check would be needed to dereference it. Because they +use the same RcBox format, an `UnguardedRc` could be freely converted to a +`ScopedRc` and an unguarded `ScopedRc` could be converted to an `UnguardedRc` +with a quick runtime check. Even though they can be easily converted back and +forth, it’s still another type, and it’s unclear how often it would be needed.