-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC 2019] Distributed Non-Blocking Algorithms and Data Structures #13690
Comments
I think another way to put this is that any allocated object won't be freed while the epoch manager is
We use camelCase for methods so it should be called |
I should note that the reason that Edit: Performance of
Microbenchmark use Time;
config const numObjects = 16 * 1024 * 1024;
var objsDom = {0..#numObjects} dmapped Cyclic(startIdx=0);
var objs : [objsDom] unmanaged object();
var manager = new DistributedEpochManager();
forall obj in objs with (var rng = new RandomStream(int)) {
on Locales[abs(rng.getNext()) % numLocales] do obj = new unmanaged object();
}
var timer = new Timer();
timer.start();
forall obj in objs with (var tok = manager.register()) {
tok.pin();
tok.delete_obj(obj);
tok.unpin();
}
writeln("Done delete_obj");
manager.clear();
timer.stop();
writeln("Time: ", timer.elapsed());
timer.clear();
manager.destroy();
}
} |
After speaking with @mppf, I can agree to emulating
|
I like the idea. I don't think the documentation should mention This also brings up: what's the story for ARM? Are you going to just document that this code can't function properly on CPU architectures that don't have a 128-bit compare-exchange capability? Or were you planning on working around that? If not, it might be wise to add some (param-based) code to the implementation so that it refuses to run on such processors. (Aside: It's not actually "Intel's |
That's a very good point @gbtitus, I didn't consider portability a problem since I assumed cmpxchg16b or something like it was universally available on newer systems. The ABA wrapper is only used currently in the epoch manager data structures but in the future we could possibly use locks when we cannot rely on it (conditionally of course). The epoch manager would be a solution to the ABA problem by itself (preventing reclamation and hence reuse of memory); Interesting tidbit about |
ARM 8.1 has 128-bit CAS https://en.wikichip.org/wiki/arm/armv8.1 |
There isn't one, no. It would have to be an indirect check such as |
[GSoC 2019] Distributed Non-Blocking Algorithms and Data Structures [PR by @dgarvit and @LouisJenkinsCS - thanks! Reviewed, tested, and merged by @mppf] This PR adds several related package modules: * modules/packages/AtomicObjects.chpl * modules/packages/EpochManager.chpl * modules/packages/LockFreeQueue.chpl * modules/packages/LockFreeStack.chpl The EpochManager is the main part. AtomicObjects is used in implementing it but might be generally useful. LockFreeQueue and LockFreeStack are user-facing data structures using the EpochManager. Please note that these tests modules only function on x86 with compilers built to include `extern` blocks (i.e. CHPL_LLVM!=none). Please see issue #13690 for design discussion.
Given the PR has been merged, can we close this issue? |
As part of Google Summer of Code 2019, both @LouisJenkinsCS and I have laid down the infrastructure for creating non-blocking algorithms and data structures in both shared and distributed memory. This issue is intended to discuss the API and user-facing semantics of said infrastructure, as well as discuss our decision choices and answer any questions that come our way. This effort has spawned three individual contributions to chapel: 1)
LocalAtomics
, a user-facing abstraction for performing atomics on arbitraryunmanaged
objects, including objects which happen to be remote; 2)EpochManager
andDistributedEpochManager
, a scalable shared-memory and distributed-memory implementation of epoch-based reclamation that enable concurrent-safe memory reclamation of objects arbitraryunmanaged
objects; 3)LockFreeQueue
, a shared-memory implementation of Michael and Scott's Queue using theEpochManager
which demonstrates its usage and acts as its own stand-alone data structure contribution.LocalAtomics
The
LocalAtomics
module is a tested package that provides the ability to perform atomic operations on arbitraryunmanaged
objects. To circumvent the infamous ABA problem, we provide anABA
wrapper that contains both the object as well as an ABA counter; all atomic operations on anABA
counter is performed via Intel'sCMPXCHG16B
assembly instruction. This is used to implement theEpochManager
andDistributedEpochManager
, and so we provide as a user-facing abstraction.Example Usage
Design Decision: Why just
unmanaged
?The
unmanaged
type, unlikeshared
andowned
, are represented as either a narrow pointer (64-bit virtual address) or a wide pointer (128-bit struct consisting of 64-bit virtual address, a 32-bit locale id, and a 32-bit sub-locale id).This is necessary for pointer compression which represents objects as just 64-bit objects.
Epoch Manager
Epoch-Based Memory Reclamation, when decoupled with Read-Copy-Update, is a very powerful and the current state-of-the-art in research literature data structures, and is comparable to other styles of memory reclamation. We chose epoch-based emory reclamation as it offers the ability to both denote lifetimes via 'epochs', in which unlike hazard pointers or
shared
atomic reference counting, it is not restricted to controlling the lifetime of a single object at a time. In Epoch-Based Reclamation, tasks enter a 'read-side critical section' and enter the current epoch e; objects deleted during e defer deletion in a 'limbo list' assocaited with e, and all objects in said limbo list will not be reclaimed until the current epoch has been advanced to e + 2. The current epoch cannot be advanced from e to e + 1 until there are no tasks in e - 1, ensuring that no task has access to objects deleted in e - 1. The disadvantage is that there is potentially unbounded memory that cannot be reclaimed if a task remains in an epoch for an excess amount of time; this is easily possible in Chapel if the system is oversubscribed and if there is no communication (implicit task yields) or explicit task yielding.An epoch manager is intended to be restricted to a specific data structure, for example a non-blocking data structure. We offer both a shared-memory optimized variant,
EpochManager
, as well as a global-view distributed variantDistributedEpochManager
that makes use of privatization. Both variants require tasks to 'register' to acquire anowned token
, which will 'unregister' the task after theowned token
gets reclaimed. Theowned token
keeps track of the task's current epoch, which can be used topin
to enter a read-side critical section, or tounpin
to exit a read-side critical section. Once an object has been logically removed from a data structure, it can then be safely deferred for reclamation via invocation ofdelete_obj
. The 'logical' removal step is important, as it is not thread-safe for multiple threads to calldelete_obj
on the same object. Thedelete_obj
treats all objects asunmanaged object
and it is safe to mix-and-match different types of objects.Example Usage
LockFreeQueue
LockFreeQueue
is an implementation of Michael & Scott's Queue in Chapel that makes use of theEpochManager
internally. To show the performance benefits of theEpochManaged
, we present a small table that shows performance comparisons to the the "Epoch Managed MS-Queue", what is being presented here as a contribution; to the 'Two-Lock' variant of the Michael & Scott Queue that uses Test-And-Set loop and one which uses a Test-And-Test-And-Set loop, both being variants of spinlocks, and finally one recycling memory and using theABA
wrapper introduced inLocalAtomics
module.The module will automatically try to reclaim after every so often, but also exposes the ability to explicit trigger a 'garbage collection' and forward the epoch via invocation of
try_reclaim
.Example Usage
Appendix
ABA Problem
The 'ABA' problem occurs when a task T1 reads the value A from location L, another task T2 writes B to L, and another task T3 writes the value A to L; once T1 checks to see if L has changed, it will incorrectly assume that it has not. To make this more concrete, think of A and B both as a
node
in a linked list; T1 reads A, T2 allocates a newnode
B and writes it to L and deletes A, and T3 allocates a newnode
which just so happens to be the same piece of memory that A had before and writes it to L. Atomic operations such thecompareExchange
will succeed despite the fact that the nodes are not the same as it will perform the operation based on the virtual address.The text was updated successfully, but these errors were encountered: