diff --git a/text/2017-11-02-guards.md b/text/2017-11-02-guards.md new file mode 100644 index 0000000..27c3504 --- /dev/null +++ b/text/2017-11-02-guards.md @@ -0,0 +1,373 @@ +# Summary + +Replace `Scope` with `Guard`. + +# Motivation + +In the [Atomic API RFC](https://github.com/crossbeam-rs/rfcs/blob/master/text/2017-05-02-atomic-api.md#scopes-or-guards) +we decided to model pinning using scopes instead of guards. The rationale was that +the previous implementation of pinning using guards was unsound and it was not +entirely obvious how to fix the problem. + +The issue with guards is that they hold a pointer/reference to thread-local data, +but they're not constrained by a lifetime (like scopes are), and thus they may outlive +the referenced thread-local data. An example demonstrating that was presented in the +aforementioned RFC. + +The key to solving the problem is the following. Instead of *constraining* the life of +a scope/guard so that it doesn't live longer than thread-local data, we can *extend* the +life of thread-local data by using reference counting. +If the thread-local data is reference-counted, a guard can safely hold a reference to +the data for an arbitrarily long time. + +# Detailed design + +There is already a [PR ready for review](https://github.com/crossbeam-rs/crossbeam-epoch/pull/31) +that replaces scopes with guards. + +## The interface + +```rust +// The guard type. +pub struct Guard { ... } + +// This interface is no different than the one of `Scope`. +impl Guard { + pub unsafe fn defer(&self, f: F) + where + F: FnOnce() -> R + Send; + + pub fn flush(&self); +} + +// Guards are clonable. +impl Clone for Guard { ... } + +// This functions doesn't take a closure anymore - it returns a guard instead. +pub fn pin() -> Guard; + +// Returns a reference to a special dummy guard that doesn't pin any thread. +pub unsafe fn unprotected() -> &'static Guard; + +// Methods now take a `&Guard` instead of a `&Scope`. +impl Atomic { + pub fn compare_and_set<'g, O>( + &self, + current: Ptr, + new: Ptr, + ord: O, + _: &'g Guard, + ) -> Result<(), Ptr<'g, T>> + where + O: CompareAndSetOrdering; + + // The same applies to other methods... +} +``` + +## How reference counting works + +Each thread participating in epoch-based garbage collection has some heap-allocated +data associated with it (in struct `Local`). All such `Local`s are connected into a +linked list and the head pointer of that list is held in the global data (in struct `Global`). +`Global` also holds the global epoch and the global garbage queue. + +Each `Local` holds a `ManuallyDrop>`, thus keeping the garbage collector in +which it resides alive. +Similarly, each `Handle` and each `Guard` keeps a pointer to the `Local` associated with it. +Handles and guards are counted using fields `handle_count` and `guard_count` inside `Local`. +When both counts reach zero, the heap-allocated `Local` is marked as deleted and then its +`Arc` is immediately dropped (using `ManuallyDrop::drop`). +If that `Local` was holding the last reference to the `Global`, then the +`Global` is destroyed as well. + +## Unprotected access to `Atomic`s + +It's interesting that `epoch::unprotected()` returns a `&'static Guard` instead +of a `Guard`. The idea is that this decision aims to foster using direct calls to +`unprotected()` over creating standalone unprotected guards. + +Consider the following example: + +```rust +// Create a standalone dummy guard. +let guard = &epoch::unprotected().clone(); +let buffer = self.buffer.load(Relaxed, guard); + +// Pass the `&'static Guard` reference directly to `load`. +let buffer = self.buffer.load(Relaxed, epoch::unprotected()); +``` + +My personal opinion: + +* While perfectly legal to write, the first version gives a + false sense of security by creating a guard on the stack. It goes like + "here I have a guard (but not really - it's fake!) and here's a load using that guard". +* The second version is more explicit in intent. It says "this is + an unprotected load". This version should be used in most cases. + +The example that motivated the decision (resizing the Chase-Lev deque): +
+ Click to expand + +```rust +#[cold] +unsafe fn resize(&self, new_cap: usize) { + // Load the bottom, top, and buffer. + let b = self.bottom.load(Relaxed); + let t = self.top.load(Relaxed); + + let buffer = self.buffer.load(Relaxed, epoch::unprotected()); + + // Allocate a new buffer. + let new = Buffer::new(new_cap); + + // Copy data from the old buffer to the new one. + let mut i = t; + while i != b { + ptr::copy_nonoverlapping(buffer.deref().at(i), new.at(i), 1); + i = i.wrapping_add(1); + } + + let guard = &epoch::pin(); + + // Replace the old buffer with the new one. + let old = self.buffer + .swap(Owned::new(new).into_ptr(guard), Release, guard); + + // Destroy the old buffer later. + guard.defer(move || old.into_owned()); + + // If the buffer is very large, then flush the thread-local garbage in order to + // deallocate it as soon as possible. + if mem::size_of::() * new_cap >= FLUSH_THRESHOLD_BYTES { + guard.flush(); + } +} +``` +
+ +First, `self.buffer` is loaded using an unprotected guard, and after +that it is loaded again (using `swap`) with a real guard. If we were +to create two guards on the stack, perhaps it would be easy to later +mistake one for another. + +## Changes to the `Collector` interface + +### Should `Handle` be `Send`? + +If we allow `Handle`s to be `Send`, the following code will compile: + +```rust +let c = Collector::new(); +let h = c.handle(); +let guard = h.pin(); + +thread::spawn(move || { + let guard = h.pin(); +}); +``` + +This is a suspicious, although not necessarily a *wrong* piece of code. However, if we +allowed this to be possible, `guard_count` would have to be an atomic integer, and +atomic operations on the counter would unnecessarily slow down pinning. + +For that reason, `Handle` will not be `Send`. + +Note that, currently, `Handle::clone` creates a brand new handle that can be passed to +another thread (see [PR #26](https://github.com/crossbeam-rs/crossbeam-epoch/pull/26)), +but with non-`Send` handles that cloning behavior is not very useful anymore. + +### Accessing the default handle + +Moreover, we've been having difficulties finding a satisfying signature for the +`default_handle` method (see [PR #28](https://github.com/crossbeam-rs/crossbeam-epoch/pull/28)). + +Here is a list of options we have to choose from: + +#### 1st option + +As proposed in [PR #28](https://github.com/crossbeam-rs/crossbeam-epoch/pull/28), and with +an additional `try_default_handle` function gated under the `nightly` feature +(because `LocalKey::try_with` is still unstable): + +```rust +pub unsafe fn default_handle() -> &'static Handle { + &*HANDLE.with(|handle| handle as *const _) +} + +#[cfg(feature = "nightly")] +pub unsafe fn try_default_handle() -> Option<&'static Handle> { + HANDLE.try_with(|handle| &*(handle as *const _)).ok() +} +``` + +Advantage: Zero-cost. + +Drawback: Functions are `unsafe` (because the `'static` lifetime is a lie). + +#### 2nd option + +These functions are similar to `LocalKey::with` and `LocalKey::try_with` in that +they take a closure that operates on a `&Handle`: + +```rust +pub fn with_default_handle R>(f: F) -> R { + HANDLE.with(|handle| f(handle)) +} + +#[cfg(feature = "nightly")] +pub fn try_with_default_handle R>(f: F) -> Option { + HANDLE.try_with(|handle| f(handle)).ok() +} +``` + +Advantage: Zero-cost. + +Drawback: Slightly inconvenient interface. + +#### 3rd option + +If we make handles non-`Send`, now we can count the number of handles inside `Local` +and provide a very similar interface to `std::thread::current()`: + +```rust +pub fn default_handle() -> Handle { + HANDLE.with(|handle| handle.clone()) +} + +#[cfg(feature = "nightly")] +pub fn try_default_handle() -> Option { + HANDLE.try_with(|handle| handle.clone()).ok() +} +``` + +Advantage: The most convenient interface. + +Drawback: There is a small cost associated with incrementing/decrementing the internal handle count. + +### The new interface + +First, `Handle::clone` will be changed so that it increments the internal reference count +(`handle_count`) and returns a new reference to the same handle (just like +`Thread::clone` returns a new reference to the same thread). + +Second, `default_handle` will be implemented like this (the 3rd option): + +```rust +lazy_static! { + static ref COLLECTOR: Collector = Collector::new(); +} + +thread_local! { + static HANDLE: Handle = COLLECTOR.handle(); +} + +// Panics if `HANDLE` is destructed. +pub fn default_handle() -> Handle { + HANDLE.with(|handle| handle.clone()) +} + +// Returns `None` if `HANDLE` is destructed. +#[cfg(feature = "nightly")] +pub fn try_default_handle() -> Option { + HANDLE.try_with(|handle| handle.clone()).ok() +} +``` + +With this interface accessing the default handle is safe and ergonomic, but +there is some associated cost of reference counting. Fortunately, the cost +is small enough to be forgivable, as will be demonstrated by benchmarks... + +## Benchmarks + +Here's a trivial benchmark that assures that guards don't bring +a performance regression (or at least not a significant one): + +```rust +// Before: pinning with scopes. +#[bench] +fn pin_empty(b: &mut Bencher) { + b.iter(|| epoch::pin(|_| ())); +} + +// After: pinning with guards. +#[bench] +fn pin_empty(b: &mut Bencher) { + b.iter(|| epoch::pin()); +} +``` + +Both before and after benchmarks show the same numbers: + +``` +test pin_empty ... bench: 12 ns/iter (+/- 0) +``` + +Now let's see how pinning using `default_handle` fares: + +```rust +#[bench] +fn default_handle_pin(b: &mut Bencher) { + b.iter(|| epoch::default_handle().pin()); +} +``` + +Result: + +``` +test default_handle_pin ... bench: 13 ns/iter (+/- 0) +``` + +As can be seen from the results, pinning with `epoch::pin()` is a little bit +faster than with `epoch::default_handle().pin()` (12 ns and 13 ns). +This is due to the overhead of reference counting, but the difference is small +enough that it won't matter in practical situations. + +# Drawbacks + +In theory, pinning with scopes has the potential to be a little bit faster. + +Scopes have a more rigid structure - they're perfectly nested, as can be seen in the +following example: + +```rust +epoch::pin(|first| { + epoch::pin(|second| { + // `second` is created after `first`. + // .. + // `second` dies before `first`. + }) +}) +``` + +On the other hand, guards can be created, moved, and dropped at arbitrary points in time: + +```rust +let a = epoch::pin(); // The first guard is created. +let b = epoch::pin(); // The second guard is created. + +let c = a; // The first guard is moved. + +drop(c); // The first guard is dropped. +drop(b); // The second guard is dropped. +``` + +When unpinning the thread, both scopes and guards decrement a counter that +keeps track of how many levels of nesting there are. However, when unpinning, +scopes already know whether the counter will become zero, while guards don't. +With scopes, the value of the counter cannot be different from the one that was +encountered when the scope was created. + +Long story short: guards have one additional branch when unpinning. +But the impact of that branch on performance seems to be minimal. + +# Alternatives + +1. Keep `Scope` without introducing `Guard`. +2. Provide both `Scope` and `Guard` at the same time. + +# Unresolved questions + +1. Do we want a public `default_collector()` function as well? +2. Do we need a `Handle::collector()` accessor?