-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC 2019] Distributed Non-Blocking Algorithms and Data Structures #13708
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we need to move only what we use from the Utilities
package (originally from https://github.com/pnnl/chgl/blob/master/src/Utilities.chpl), and the helper data structures such as Vector
, LockFreeQueue
, LimboList
, etc., all into EpochManager
and submodules (undocumented); we also need to rename them. Also the LockFreeQueue
here is the ABA
recycled queue, so you need to move ReclaimedLockFreeQueue
to where it is right now.
Also this issue should likely be given the same title as the original issue, as this PR is about providing an infrastructure rather than just an epoch-based memory reclamation system. |
modules/packages/LocalAtomics.chpl
Outdated
head.write(new unmanaged Node(int)); // Need 'write(objType)' | ||
*/ | ||
|
||
extern { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runtime could provide it, certainly, perhaps in files that were siblings of include/atomics/*/chpl-atomics.h
. I don't think that's strictly necessary for the purposes of this PR, however. That said, if you were going to support ARM you'd obviously need to update what's here so as to use a different _asm
block for that architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I just disliked the fact that you need to compile with llvm
to use this module, when its not strictly necessary; LLVM takes a long time to build, and I know that there are some platforms that have difficulty building it (as I've run into issue when trying to build it myself). I'd like to commit to a quick runtime change, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd make sense to add some minimal 128-bit CAS support to the runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is CHPL_LLVM=llvm
set by default now? I've noticed that I do not have CHPL_LLVM
environment variable set; if so, then it doesn't matter if the LocalAtomics.chpl
module has an extern
block.
Edit: I found that the reason why it set CHPL_LLVM=llvm
by default was because I had built it once before.
chapel/util/chplenv/chpl_llvm.py
Lines 28 to 43 in 7011bde
@memoize | |
def get(): | |
llvm_val = overrides.get('CHPL_LLVM') | |
if not llvm_val: | |
# compute a default based on if the included llvm is built | |
chpl_third_party = get_chpl_third_party() | |
llvm_target_dir = get_uniq_cfg_path_for('llvm') | |
llvm_subdir = os.path.join(chpl_third_party, 'llvm', 'install', | |
llvm_target_dir) | |
llvm_header = os.path.join(llvm_subdir, 'include', 'llvm', | |
'PassSupport.h') | |
if os.path.exists(llvm_header): | |
llvm_val = 'llvm' | |
else: | |
llvm_val = 'none' | |
return llvm_val |
modules/packages/LocalAtomics.chpl
Outdated
} | ||
|
||
|
||
inline proc localityCheck(objs...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget why I perform this check again, since I used pointer compression.
test/library/packages/EpochManager/sharedMemory/ReclaimedLockFreeQueue.chpl
Outdated
Show resolved
Hide resolved
I'd recommend that we add a |
I would recommend that |
I think this PR is ready to be merged. We will make additions to the runtime at a later date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this feedback is addressed I will need to look at the function bodies in LocalAtomics and EpochManager in more detail. So far I have focused on the API issues.
I just performed the major overhaul of |
Finished documentation for |
Note: @dgarvit We need to rebase this branch on top of upstream master so that we can ensure that our PR is compatible with the numerous potentially breaking changes that are rolling out soon. We should do so this weekend if you are available. |
…to one file; update tests
…it tests, renamed a few
…tion relatively gracefully; also added documentation and a new 'drain' iterator for stack
c97ed77
to
74903e7
Compare
Would be helpful if #13873 was resolved, which would lighten the workload. |
tok.pin(); | ||
do { | ||
var oldTop = _top.read(); | ||
n.next = oldTop; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the deque have a chpl_task_yield here but not the stack? Generally it's good to only yield every n iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To handle cases of oversubscription, and handle cases where there is contention. Ideally, we'd have some backoff if we fail, because it means that there is a lot of contention on the head
or tail
. Coupled with the fact that Chapel is a cooperative tasking language, yielding is a way to allow for other tasks to have a go. Although maybe it should try a few times before it tries to yield.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but there are several other loops like this (in the LockFreeStack, in the EpochManager). I would expect all of them to include chpl_task_yield.
var retval : objType; | ||
return (false, retval); | ||
} | ||
var newTop = oldTop.next; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be calling chpl_task_yield periodically?
modules/packages/AtomicObjects.chpl
Outdated
this.complete(); | ||
if hasABASupport { | ||
var ptr : c_void_ptr; | ||
posix_memalign(c_ptrTo(ptr), 16, c_sizeof(ABA(objType?))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are just allocating 16 bytes, why not store it in the record itself? You should be able to do that with a c_array
storing 2 uints, e.g. c_array(uint, 2)
.
Otherwise, it seems that the atomic is stored in the record if !hasABASupport but it is stored off of a pointer if hasABASupport. I can't think of a reason that this would be what is desired (wouldn't you want hasABASupport to just affect the size of the atomic, not whether it is separately allocated)?
If there is a performance benefit to the aligned allocation, shouldn't we also allocate something if !hasABASupport?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a multilocale compilation and we have hasGlobalSupport set, I'd expect the pointer to consist of 128-bits even when !hasABASupport, rather than relying on the pointer compression. It seems to me that the pointer compression is only necessary for hasABASupport && hasGlobalSupport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMPXCHG16B
instruction requires that the source operand is 16-byte aligned, or else it will result in a general protection fault (GPF). It is necessary that this is a pointer.- The reasoning behind using pointer compression is RDMA atomics. If
hasGlobalSupport && !hasABASupport
, then you have RDMA atomics for everything. IfhasGlobalSupport && hasABASupport
, this must be done as remote-execution atomics. If!hasGlobalSupport && hasABASupport
, then the compression isn't used and we only use the lower 64-bits. Enabling RDMA atomics is a huge performance optimization here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that this should be documented somewhere though.
modules/packages/EpochManager.chpl
Outdated
forall i in 1..EBR_EPOCHS do | ||
limbo_list[i] = new unmanaged LimboList(); | ||
forall i in LocaleSpace do | ||
objsToDelete[i] = new unmanaged Vector(unmanaged object); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this duplicate code with the other init be factored out into a helper method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suppose so.
modules/packages/EpochManager.chpl
Outdated
|
||
:returns: A handle to the manager | ||
*/ | ||
proc register() : owned DistTokenWrapper { // owned DistTokenWrapper { // Should be called only once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented out code should be removed
modules/packages/EpochManager.chpl
Outdated
for tok in allocated_list { | ||
var local_epoch = tok.local_epoch.read(); | ||
if (local_epoch > 0) then | ||
minEpoch = min(minEpoch, local_epoch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? Since the epoch numbers are cyclic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this is wrong since we do not use fetchAdd
, like I originally thought we did. We should be doing fetchAdd
and then modulus division on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're thinking needs to be fetchAdd'd ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed my mind about this, but it was to refer to when the global_epoch
gets advanced.
modules/packages/EpochManager.chpl
Outdated
} | ||
const current_global_epoch = global_epoch.read(); | ||
|
||
if minEpoch == current_global_epoch || minEpoch == max(uint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is to check that each task is on current_global_epoch, I'm not sure checking the minimum epoch does it. Because the global epoch could be 1, and we might have some tasks on epoch 3 and some on epoch 1. Here 3 is "before" 1.
One way to address this would be to get both the minimum and maximum of the task epochs. Another way would be to simply compute the epoch shared by all the tasks if it exits, and some sentinel value (e.g. max(uint) or 0) if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that this will not work. The new plan would be to fetchAdd
on the epoch, in which the monotonicity would ensure getMinimumEpoch
is valid, and use modulus division to determine which limbo list to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, instead we'll just check if given the current epoch e, whether the lowest task is on e - 1 or not; since there are only 3 possible values, and it is impossible for another task to be on e - 2.
modules/packages/EpochManager.chpl
Outdated
------------------------ | ||
To avoid reclamation while a task is accessing a resource, I.E. to enter | ||
critical section, a task must `pin`. Correspondingly to exit critical section, | ||
the task must `unpin`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pin/unpin doesn't do anything to protect the user data structure, right? It's just a critical section for the memory reclamation.
Add skipifs for tests using AtomicObjects Follow-on to PR #13708. AtomicObjects.chpl uses extern blocks so can only work when CHPL_LLVM!=none Additionally it uses inline assembly and so is only expected to work with gcc/clang on x86_64. Trivial test change only; not reviewed.
This PR adds several related package modules:
The EpochManager is the main part. AtomicObjects is used in implementing
it but might be generally useful. LockFreeQueue and LockFreeStack are
user-facing data structures using the EpochManager.
Please note that these tests modules only function on x86 with compilers
built to include
extern
blocks (i.e. CHPL_LLVM!=none).Please see issue #13690 for design discussion.