-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Replace scopes with guards * Elaborate on default_handle and benchmarks * Change the signature of unprotected() * Address comments and elaborate on default_handle
- Loading branch information
Stjepan Glavina
authored
Nov 20, 2017
1 parent
aed5138
commit add3c93
Showing
1 changed file
with
373 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,373 @@ | ||
# Summary | ||
|
||
Replace `Scope` with `Guard`. | ||
|
||
# Motivation | ||
|
||
In the [Atomic API RFC](https://github.com/crossbeam-rs/rfcs/blob/master/text/2017-05-02-atomic-api.md#scopes-or-guards) | ||
we decided to model pinning using scopes instead of guards. The rationale was that | ||
the previous implementation of pinning using guards was unsound and it was not | ||
entirely obvious how to fix the problem. | ||
|
||
The issue with guards is that they hold a pointer/reference to thread-local data, | ||
but they're not constrained by a lifetime (like scopes are), and thus they may outlive | ||
the referenced thread-local data. An example demonstrating that was presented in the | ||
aforementioned RFC. | ||
|
||
The key to solving the problem is the following. Instead of *constraining* the life of | ||
a scope/guard so that it doesn't live longer than thread-local data, we can *extend* the | ||
life of thread-local data by using reference counting. | ||
If the thread-local data is reference-counted, a guard can safely hold a reference to | ||
the data for an arbitrarily long time. | ||
|
||
# Detailed design | ||
|
||
There is already a [PR ready for review](https://github.com/crossbeam-rs/crossbeam-epoch/pull/31) | ||
that replaces scopes with guards. | ||
|
||
## The interface | ||
|
||
```rust | ||
// The guard type. | ||
pub struct Guard { ... } | ||
|
||
// This interface is no different than the one of `Scope`. | ||
impl Guard { | ||
pub unsafe fn defer<F, R>(&self, f: F) | ||
where | ||
F: FnOnce() -> R + Send; | ||
|
||
pub fn flush(&self); | ||
} | ||
|
||
// Guards are clonable. | ||
impl Clone for Guard { ... } | ||
|
||
// This functions doesn't take a closure anymore - it returns a guard instead. | ||
pub fn pin() -> Guard; | ||
|
||
// Returns a reference to a special dummy guard that doesn't pin any thread. | ||
pub unsafe fn unprotected() -> &'static Guard; | ||
|
||
// Methods now take a `&Guard` instead of a `&Scope`. | ||
impl<T> Atomic<T> { | ||
pub fn compare_and_set<'g, O>( | ||
&self, | ||
current: Ptr<T>, | ||
new: Ptr<T>, | ||
ord: O, | ||
_: &'g Guard, | ||
) -> Result<(), Ptr<'g, T>> | ||
where | ||
O: CompareAndSetOrdering; | ||
|
||
// The same applies to other methods... | ||
} | ||
``` | ||
|
||
## How reference counting works | ||
|
||
Each thread participating in epoch-based garbage collection has some heap-allocated | ||
data associated with it (in struct `Local`). All such `Local`s are connected into a | ||
linked list and the head pointer of that list is held in the global data (in struct `Global`). | ||
`Global` also holds the global epoch and the global garbage queue. | ||
|
||
Each `Local` holds a `ManuallyDrop<Arc<Global>>`, thus keeping the garbage collector in | ||
which it resides alive. | ||
Similarly, each `Handle` and each `Guard` keeps a pointer to the `Local` associated with it. | ||
Handles and guards are counted using fields `handle_count` and `guard_count` inside `Local`. | ||
When both counts reach zero, the heap-allocated `Local` is marked as deleted and then its | ||
`Arc<Global>` is immediately dropped (using `ManuallyDrop::drop`). | ||
If that `Local` was holding the last reference to the `Global`, then the | ||
`Global` is destroyed as well. | ||
|
||
## Unprotected access to `Atomic`s | ||
|
||
It's interesting that `epoch::unprotected()` returns a `&'static Guard` instead | ||
of a `Guard`. The idea is that this decision aims to foster using direct calls to | ||
`unprotected()` over creating standalone unprotected guards. | ||
|
||
Consider the following example: | ||
|
||
```rust | ||
// Create a standalone dummy guard. | ||
let guard = &epoch::unprotected().clone(); | ||
let buffer = self.buffer.load(Relaxed, guard); | ||
|
||
// Pass the `&'static Guard` reference directly to `load`. | ||
let buffer = self.buffer.load(Relaxed, epoch::unprotected()); | ||
``` | ||
|
||
My personal opinion: | ||
|
||
* While perfectly legal to write, the first version gives a | ||
false sense of security by creating a guard on the stack. It goes like | ||
"here I have a guard (but not really - it's fake!) and here's a load using that guard". | ||
* The second version is more explicit in intent. It says "this is | ||
an unprotected load". This version should be used in most cases. | ||
|
||
The example that motivated the decision (resizing the Chase-Lev deque): | ||
<details> | ||
<summary>Click to expand</summary> | ||
|
||
```rust | ||
#[cold] | ||
unsafe fn resize(&self, new_cap: usize) { | ||
// Load the bottom, top, and buffer. | ||
let b = self.bottom.load(Relaxed); | ||
let t = self.top.load(Relaxed); | ||
|
||
let buffer = self.buffer.load(Relaxed, epoch::unprotected()); | ||
|
||
// Allocate a new buffer. | ||
let new = Buffer::new(new_cap); | ||
|
||
// Copy data from the old buffer to the new one. | ||
let mut i = t; | ||
while i != b { | ||
ptr::copy_nonoverlapping(buffer.deref().at(i), new.at(i), 1); | ||
i = i.wrapping_add(1); | ||
} | ||
|
||
let guard = &epoch::pin(); | ||
|
||
// Replace the old buffer with the new one. | ||
let old = self.buffer | ||
.swap(Owned::new(new).into_ptr(guard), Release, guard); | ||
|
||
// Destroy the old buffer later. | ||
guard.defer(move || old.into_owned()); | ||
|
||
// If the buffer is very large, then flush the thread-local garbage in order to | ||
// deallocate it as soon as possible. | ||
if mem::size_of::<T>() * new_cap >= FLUSH_THRESHOLD_BYTES { | ||
guard.flush(); | ||
} | ||
} | ||
``` | ||
</details> | ||
|
||
First, `self.buffer` is loaded using an unprotected guard, and after | ||
that it is loaded again (using `swap`) with a real guard. If we were | ||
to create two guards on the stack, perhaps it would be easy to later | ||
mistake one for another. | ||
|
||
## Changes to the `Collector` interface | ||
|
||
### Should `Handle` be `Send`? | ||
|
||
If we allow `Handle`s to be `Send`, the following code will compile: | ||
|
||
```rust | ||
let c = Collector::new(); | ||
let h = c.handle(); | ||
let guard = h.pin(); | ||
|
||
thread::spawn(move || { | ||
let guard = h.pin(); | ||
}); | ||
``` | ||
|
||
This is a suspicious, although not necessarily a *wrong* piece of code. However, if we | ||
allowed this to be possible, `guard_count` would have to be an atomic integer, and | ||
atomic operations on the counter would unnecessarily slow down pinning. | ||
|
||
For that reason, `Handle` will not be `Send`. | ||
|
||
Note that, currently, `Handle::clone` creates a brand new handle that can be passed to | ||
another thread (see [PR #26](https://github.com/crossbeam-rs/crossbeam-epoch/pull/26)), | ||
but with non-`Send` handles that cloning behavior is not very useful anymore. | ||
|
||
### Accessing the default handle | ||
|
||
Moreover, we've been having difficulties finding a satisfying signature for the | ||
`default_handle` method (see [PR #28](https://github.com/crossbeam-rs/crossbeam-epoch/pull/28)). | ||
|
||
Here is a list of options we have to choose from: | ||
|
||
#### 1st option | ||
|
||
As proposed in [PR #28](https://github.com/crossbeam-rs/crossbeam-epoch/pull/28), and with | ||
an additional `try_default_handle` function gated under the `nightly` feature | ||
(because `LocalKey::try_with` is still unstable): | ||
|
||
```rust | ||
pub unsafe fn default_handle() -> &'static Handle { | ||
&*HANDLE.with(|handle| handle as *const _) | ||
} | ||
|
||
#[cfg(feature = "nightly")] | ||
pub unsafe fn try_default_handle() -> Option<&'static Handle> { | ||
HANDLE.try_with(|handle| &*(handle as *const _)).ok() | ||
} | ||
``` | ||
|
||
Advantage: Zero-cost. | ||
|
||
Drawback: Functions are `unsafe` (because the `'static` lifetime is a lie). | ||
|
||
#### 2nd option | ||
|
||
These functions are similar to `LocalKey::with` and `LocalKey::try_with` in that | ||
they take a closure that operates on a `&Handle`: | ||
|
||
```rust | ||
pub fn with_default_handle<R, F: FnOnce(&Handle) -> R>(f: F) -> R { | ||
HANDLE.with(|handle| f(handle)) | ||
} | ||
|
||
#[cfg(feature = "nightly")] | ||
pub fn try_with_default_handle<R, F: FnOnce(&Handle) -> R>(f: F) -> Option<R> { | ||
HANDLE.try_with(|handle| f(handle)).ok() | ||
} | ||
``` | ||
|
||
Advantage: Zero-cost. | ||
|
||
Drawback: Slightly inconvenient interface. | ||
|
||
#### 3rd option | ||
|
||
If we make handles non-`Send`, now we can count the number of handles inside `Local` | ||
and provide a very similar interface to `std::thread::current()`: | ||
|
||
```rust | ||
pub fn default_handle() -> Handle { | ||
HANDLE.with(|handle| handle.clone()) | ||
} | ||
|
||
#[cfg(feature = "nightly")] | ||
pub fn try_default_handle() -> Option<Handle> { | ||
HANDLE.try_with(|handle| handle.clone()).ok() | ||
} | ||
``` | ||
|
||
Advantage: The most convenient interface. | ||
|
||
Drawback: There is a small cost associated with incrementing/decrementing the internal handle count. | ||
|
||
### The new interface | ||
|
||
First, `Handle::clone` will be changed so that it increments the internal reference count | ||
(`handle_count`) and returns a new reference to the same handle (just like | ||
`Thread::clone` returns a new reference to the same thread). | ||
|
||
Second, `default_handle` will be implemented like this (the 3rd option): | ||
|
||
```rust | ||
lazy_static! { | ||
static ref COLLECTOR: Collector = Collector::new(); | ||
} | ||
|
||
thread_local! { | ||
static HANDLE: Handle = COLLECTOR.handle(); | ||
} | ||
|
||
// Panics if `HANDLE` is destructed. | ||
pub fn default_handle() -> Handle { | ||
HANDLE.with(|handle| handle.clone()) | ||
} | ||
|
||
// Returns `None` if `HANDLE` is destructed. | ||
#[cfg(feature = "nightly")] | ||
pub fn try_default_handle() -> Option<Handle> { | ||
HANDLE.try_with(|handle| handle.clone()).ok() | ||
} | ||
``` | ||
|
||
With this interface accessing the default handle is safe and ergonomic, but | ||
there is some associated cost of reference counting. Fortunately, the cost | ||
is small enough to be forgivable, as will be demonstrated by benchmarks... | ||
|
||
## Benchmarks | ||
|
||
Here's a trivial benchmark that assures that guards don't bring | ||
a performance regression (or at least not a significant one): | ||
|
||
```rust | ||
// Before: pinning with scopes. | ||
#[bench] | ||
fn pin_empty(b: &mut Bencher) { | ||
b.iter(|| epoch::pin(|_| ())); | ||
} | ||
|
||
// After: pinning with guards. | ||
#[bench] | ||
fn pin_empty(b: &mut Bencher) { | ||
b.iter(|| epoch::pin()); | ||
} | ||
``` | ||
|
||
Both before and after benchmarks show the same numbers: | ||
|
||
``` | ||
test pin_empty ... bench: 12 ns/iter (+/- 0) | ||
``` | ||
|
||
Now let's see how pinning using `default_handle` fares: | ||
|
||
```rust | ||
#[bench] | ||
fn default_handle_pin(b: &mut Bencher) { | ||
b.iter(|| epoch::default_handle().pin()); | ||
} | ||
``` | ||
|
||
Result: | ||
|
||
``` | ||
test default_handle_pin ... bench: 13 ns/iter (+/- 0) | ||
``` | ||
|
||
As can be seen from the results, pinning with `epoch::pin()` is a little bit | ||
faster than with `epoch::default_handle().pin()` (12 ns and 13 ns). | ||
This is due to the overhead of reference counting, but the difference is small | ||
enough that it won't matter in practical situations. | ||
|
||
# Drawbacks | ||
|
||
In theory, pinning with scopes has the potential to be a little bit faster. | ||
|
||
Scopes have a more rigid structure - they're perfectly nested, as can be seen in the | ||
following example: | ||
|
||
```rust | ||
epoch::pin(|first| { | ||
epoch::pin(|second| { | ||
// `second` is created after `first`. | ||
// .. | ||
// `second` dies before `first`. | ||
}) | ||
}) | ||
``` | ||
|
||
On the other hand, guards can be created, moved, and dropped at arbitrary points in time: | ||
|
||
```rust | ||
let a = epoch::pin(); // The first guard is created. | ||
let b = epoch::pin(); // The second guard is created. | ||
|
||
let c = a; // The first guard is moved. | ||
|
||
drop(c); // The first guard is dropped. | ||
drop(b); // The second guard is dropped. | ||
``` | ||
|
||
When unpinning the thread, both scopes and guards decrement a counter that | ||
keeps track of how many levels of nesting there are. However, when unpinning, | ||
scopes already know whether the counter will become zero, while guards don't. | ||
With scopes, the value of the counter cannot be different from the one that was | ||
encountered when the scope was created. | ||
|
||
Long story short: guards have one additional branch when unpinning. | ||
But the impact of that branch on performance seems to be minimal. | ||
|
||
# Alternatives | ||
|
||
1. Keep `Scope` without introducing `Guard`. | ||
2. Provide both `Scope` and `Guard` at the same time. | ||
|
||
# Unresolved questions | ||
|
||
1. Do we want a public `default_collector()` function as well? | ||
2. Do we need a `Handle::collector()` accessor? |