Replace scopes with guards (#20)

* Replace scopes with guards * Elaborate on default_handle and benchmarks * Change the signature of unprotected() * Address comments and elaborate on default_handle
crossbeam-rs · Nov 20, 2017 · add3c93 · add3c93
1 parent aed5138
commit add3c93
Showing 1 changed file with 373 additions and 0 deletions.
diff --git a/text/2017-11-02-guards.md b/text/2017-11-02-guards.md
@@ -0,0 +1,373 @@
+# Summary
+
+Replace `Scope` with `Guard`.
+
+# Motivation
+
+In the [Atomic API RFC](https://github.com/crossbeam-rs/rfcs/blob/master/text/2017-05-02-atomic-api.md#scopes-or-guards)
+we decided to model pinning using scopes instead of guards. The rationale was that
+the previous implementation of pinning using guards was unsound and it was not
+entirely obvious how to fix the problem.
+
+The issue with guards is that they hold a pointer/reference to thread-local data,
+but they're not constrained by a lifetime (like scopes are), and thus they may outlive
+the referenced thread-local data. An example demonstrating that was presented in the
+aforementioned RFC.
+
+The key to solving the problem is the following. Instead of *constraining* the life of
+a scope/guard so that it doesn't live longer than thread-local data, we can *extend* the
+life of thread-local data by using reference counting.
+If the thread-local data is reference-counted, a guard can safely hold a reference to
+the data for an arbitrarily long time.
+
+# Detailed design
+
+There is already a [PR ready for review](https://github.com/crossbeam-rs/crossbeam-epoch/pull/31)
+that replaces scopes with guards.
+
+## The interface
+
+```rust
+// The guard type.
+pub struct Guard { ... }
+
+// This interface is no different than the one of `Scope`.
+impl Guard {
+    pub unsafe fn defer<F, R>(&self, f: F)
+    where
+        F: FnOnce() -> R + Send;
+
+    pub fn flush(&self);
+}
+
+// Guards are clonable.
+impl Clone for Guard { ... }
+
+// This functions doesn't take a closure anymore - it returns a guard instead.
+pub fn pin() -> Guard;
+
+// Returns a reference to a special dummy guard that doesn't pin any thread.
+pub unsafe fn unprotected() -> &'static Guard;
+
+// Methods now take a `&Guard` instead of a `&Scope`.
+impl<T> Atomic<T> {
+    pub fn compare_and_set<'g, O>(
+        &self,
+        current: Ptr<T>,
+        new: Ptr<T>,
+        ord: O,
+        _: &'g Guard,
+    ) -> Result<(), Ptr<'g, T>>
+    where
+        O: CompareAndSetOrdering;
+
+    // The same applies to other methods...
+}
+```
+
+## How reference counting works
+
+Each thread participating in epoch-based garbage collection has some heap-allocated
+data associated with it (in struct `Local`). All such `Local`s are connected into a
+linked list and the head pointer of that list is held in the global data (in struct `Global`).
+`Global` also holds the global epoch and the global garbage queue.
+
+Each `Local` holds a `ManuallyDrop<Arc<Global>>`, thus keeping the garbage collector in
+which it resides alive.
+Similarly, each `Handle` and each `Guard` keeps a pointer to the `Local` associated with it.
+Handles and guards are counted using fields `handle_count` and `guard_count` inside `Local`.
+When both counts reach zero, the heap-allocated `Local` is marked as deleted and then its
+`Arc<Global>` is immediately dropped (using `ManuallyDrop::drop`).
+If that `Local` was holding the last reference to the `Global`, then the
+`Global` is destroyed as well.
+
+## Unprotected access to `Atomic`s
+
+It's interesting that `epoch::unprotected()` returns a `&'static Guard` instead
+of a `Guard`. The idea is that this decision aims to foster using direct calls to
+`unprotected()` over creating standalone unprotected guards.
+
+Consider the following example:
+
+```rust
+// Create a standalone dummy guard.
+let guard = &epoch::unprotected().clone();
+let buffer = self.buffer.load(Relaxed, guard);
+
+// Pass the `&'static Guard` reference directly to `load`.
+let buffer = self.buffer.load(Relaxed, epoch::unprotected());
+```
+
+My personal opinion:
+
+* While perfectly legal to write, the first version gives a
+  false sense of security by creating a guard on the stack. It goes like
+  "here I have a guard (but not really - it's fake!) and here's a load using that guard".
+* The second version is more explicit in intent. It says "this is
+  an unprotected load". This version should be used in most cases.
+
+The example that motivated the decision (resizing the Chase-Lev deque):
+<details>
+  <summary>Click to expand</summary>
+
+```rust
+#[cold]
+unsafe fn resize(&self, new_cap: usize) {
+    // Load the bottom, top, and buffer.
+    let b = self.bottom.load(Relaxed);
+    let t = self.top.load(Relaxed);
+
+    let buffer = self.buffer.load(Relaxed, epoch::unprotected());
+
+    // Allocate a new buffer.
+    let new = Buffer::new(new_cap);
+
+    // Copy data from the old buffer to the new one.
+    let mut i = t;
+    while i != b {
+        ptr::copy_nonoverlapping(buffer.deref().at(i), new.at(i), 1);
+        i = i.wrapping_add(1);
+    }
+
+    let guard = &epoch::pin();
+
+    // Replace the old buffer with the new one.
+    let old = self.buffer
+        .swap(Owned::new(new).into_ptr(guard), Release, guard);
+
+    // Destroy the old buffer later.
+    guard.defer(move || old.into_owned());
+
+    // If the buffer is very large, then flush the thread-local garbage in order to
+    // deallocate it as soon as possible.
+    if mem::size_of::<T>() * new_cap >= FLUSH_THRESHOLD_BYTES {
+        guard.flush();
+    }
+}
+```
+</details>
+
+First, `self.buffer` is loaded using an unprotected guard, and after
+that it is loaded again (using `swap`) with a real guard. If we were
+to create two guards on the stack, perhaps it would be easy to later
+mistake one for another.
+
+## Changes to the `Collector` interface
+
+### Should `Handle` be `Send`?
+
+If we allow `Handle`s to be `Send`, the following code will compile:
+
+```rust
+let c = Collector::new();
+let h = c.handle();
+let guard = h.pin();
+
+thread::spawn(move || {
+    let guard = h.pin();
+});
+```
+
+This is a suspicious, although not necessarily a *wrong* piece of code. However, if we
+allowed this to be possible, `guard_count` would have to be an atomic integer, and
+atomic operations on the counter would unnecessarily slow down pinning.
+
+For that reason, `Handle` will not be `Send`.
+
+Note that, currently, `Handle::clone` creates a brand new handle that can be passed to
+another thread (see [PR #26](https://github.com/crossbeam-rs/crossbeam-epoch/pull/26)),
+but with non-`Send` handles that cloning behavior is not very useful anymore.
+
+### Accessing the default handle
+
+Moreover, we've been having difficulties finding a satisfying signature for the
+`default_handle` method (see [PR #28](https://github.com/crossbeam-rs/crossbeam-epoch/pull/28)).
+
+Here is a list of options we have to choose from:
+
+#### 1st option
+
+As proposed in [PR #28](https://github.com/crossbeam-rs/crossbeam-epoch/pull/28), and with
+an additional `try_default_handle` function gated under the `nightly` feature
+(because `LocalKey::try_with` is still unstable):
+
+```rust
+pub unsafe fn default_handle() -> &'static Handle {
+    &*HANDLE.with(|handle| handle as *const _)
+}
+
+#[cfg(feature = "nightly")]
+pub unsafe fn try_default_handle() -> Option<&'static Handle> {
+    HANDLE.try_with(|handle| &*(handle as *const _)).ok()
+}
+```
+
+Advantage: Zero-cost.
+
+Drawback: Functions are `unsafe` (because the `'static` lifetime is a lie).
+
+#### 2nd option
+
+These functions are similar to `LocalKey::with` and `LocalKey::try_with` in that
+they take a closure that operates on a `&Handle`:
+
+```rust
+pub fn with_default_handle<R, F: FnOnce(&Handle) -> R>(f: F) -> R {
+    HANDLE.with(|handle| f(handle))
+}
+
+#[cfg(feature = "nightly")]
+pub fn try_with_default_handle<R, F: FnOnce(&Handle) -> R>(f: F) -> Option<R> {
+    HANDLE.try_with(|handle| f(handle)).ok()
+}
+```
+
+Advantage: Zero-cost.
+
+Drawback: Slightly inconvenient interface.
+
+#### 3rd option
+
+If we make handles non-`Send`, now we can count the number of handles inside `Local`
+and provide a very similar interface to `std::thread::current()`:
+
+```rust
+pub fn default_handle() -> Handle {
+    HANDLE.with(|handle| handle.clone())
+}
+
+#[cfg(feature = "nightly")]
+pub fn try_default_handle() -> Option<Handle> {
+    HANDLE.try_with(|handle| handle.clone()).ok()
+}
+```
+
+Advantage: The most convenient interface.
+
+Drawback: There is a small cost associated with incrementing/decrementing the internal handle count.
+
+### The new interface
+
+First, `Handle::clone` will be changed so that it increments the internal reference count
+(`handle_count`) and returns a new reference to the same handle (just like
+`Thread::clone` returns a new reference to the same thread).
+
+Second, `default_handle` will be implemented like this (the 3rd option):
+
+```rust
+lazy_static! {
+    static ref COLLECTOR: Collector = Collector::new();
+}
+
+thread_local! {
+    static HANDLE: Handle = COLLECTOR.handle();
+}
+
+// Panics if `HANDLE` is destructed.
+pub fn default_handle() -> Handle {
+    HANDLE.with(|handle| handle.clone())
+}
+
+// Returns `None` if `HANDLE` is destructed.
+#[cfg(feature = "nightly")]
+pub fn try_default_handle() -> Option<Handle> {
+    HANDLE.try_with(|handle| handle.clone()).ok()
+}
+```
+
+With this interface accessing the default handle is safe and ergonomic, but
+there is some associated cost of reference counting. Fortunately, the cost
+is small enough to be forgivable, as will be demonstrated by benchmarks...
+
+## Benchmarks
+
+Here's a trivial benchmark that assures that guards don't bring
+a performance regression (or at least not a significant one):
+
+```rust
+// Before: pinning with scopes.
+#[bench]
+fn pin_empty(b: &mut Bencher) {
+    b.iter(|| epoch::pin(|_| ()));
+}
+
+// After: pinning with guards.
+#[bench]
+fn pin_empty(b: &mut Bencher) {
+    b.iter(|| epoch::pin());
+}
+```
+
+Both before and after benchmarks show the same numbers:
+
+```
+test pin_empty ... bench:          12 ns/iter (+/- 0)
+```
+
+Now let's see how pinning using `default_handle` fares:
+
+```rust
+#[bench]
+fn default_handle_pin(b: &mut Bencher) {
+    b.iter(|| epoch::default_handle().pin());
+}
+```
+
+Result:
+
+```
+test default_handle_pin ... bench:          13 ns/iter (+/- 0)
+```
+
+As can be seen from the results, pinning with `epoch::pin()` is a little bit
+faster than with `epoch::default_handle().pin()` (12 ns and 13 ns).
+This is due to the overhead of reference counting, but the difference is small
+enough that it won't matter in practical situations.
+
+# Drawbacks
+
+In theory, pinning with scopes has the potential to be a little bit faster.
+
+Scopes have a more rigid structure - they're perfectly nested, as can be seen in the
+following example:
+
+```rust
+epoch::pin(|first| {
+    epoch::pin(|second| {
+        // `second` is created after `first`.
+        // ..
+        // `second` dies before `first`.
+    })
+})
+```
+
+On the other hand, guards can be created, moved, and dropped at arbitrary points in time:
+
+```rust
+let a = epoch::pin(); // The first guard is created.
+let b = epoch::pin(); // The second guard is created.
+
+let c = a; // The first guard is moved.
+
+drop(c); // The first guard is dropped.
+drop(b); // The second guard is dropped.
+```
+
+When unpinning the thread, both scopes and guards decrement a counter that
+keeps track of how many levels of nesting there are. However, when unpinning,
+scopes already know whether the counter will become zero, while guards don't.
+With scopes, the value of the counter cannot be different from the one that was
+encountered when the scope was created.
+
+Long story short: guards have one additional branch when unpinning.
+But the impact of that branch on performance seems to be minimal.
+
+# Alternatives
+
+1. Keep `Scope` without introducing `Guard`.
+2. Provide both `Scope` and `Guard` at the same time.
+
+# Unresolved questions
+
+1. Do we want a public `default_collector()` function as well?
+2. Do we need a `Handle::collector()` accessor?