-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make WeakKeyDict finalizers thread/async-safe #16204
Conversation
else if (prev && !on) { | ||
// enable -> disable | ||
jl_atomic_fetch_add(&jl_finalizers_disabled_counter, 1); | ||
// TODO: check if finalizers are running and wait for them to finish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary? Since finalizer runs can nest and can run on multiple threads, I feel like waiting for all finalizers to finish is asking for dead lock. (e.g. if this is called by a finalizer on two different threads because both of them modifies a dict).
For avoiding corruption with multithreading it is still necessary to use explicit locks. It is only the case if the finalizer runs on the same thread that you can't really protect with a lock and requires this API to disable the finalizer on the same thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another word, I think we probably only need a thread local disable_finalizers
and the caller of this function is expected to do the necessary thread synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may need to think about ways of localizing finalizers to the appropriate thread. technically though, this lock condition might only need to control access to the finalizer list, but would not need to be held while running a finalizer.
if we stop running finalizers on the same thread, then I guess the more standard implementation would be to swap this for explicit internal Dict locks. but currently, I think we would need both (to avoid having the finalizer try to grab the non-recursive lock we are already holding and deadlock).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether we need a Dict lock or not depends on what's the thread safety guarantee we provide for stdlib. Currently, the user always has to synchronize access to a dict from multiple threads and there's nothing special about whether it is running from within a finalizer or not and the only feature Base has to provide is that finalizer running on the same thread will not corrupt the data structure.
If, however, we begin to guarantee thread safety of Base data structures it would indeed be better to use that for finalizer synchronization too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. but I think when we do that I think we either need to make finalize-disable part of acquiring a lock, or run all finalizers on a separate thread. for now, I've been tackling those concerns independently. I can understand the argument though that multi.jl might be responsible for synchronizing the Dict access to client_refs. Somehow WeakKeyDict would need to be compliant and use the same lock.
is this branch/pr still necessary after you pushed part of it to master? if not, please close the pr and delete the branch |
@yuyichao indicated the thread-safety issues and the skeleton of a solution, and explained the need to keep references to BigInt before ccall, and suggested to add a field at the end of MPZ. He also pointed at the GC un-safety of the previous solution due to the fact that finalizers can call arbitrary code. @ScottPJones suggested to disable the GC to address this specific problem.
PR re-written to synchronize only WeakKeyDict, and re-schedule finalizers rather than disabling them (updated PR based on #17590) |
|
||
Test-and-test-and-set spin locks are quickest up to about 30ish contending threads. If you have more contention than that, perhaps a lock is the wrong way to synchronize. | ||
|
||
See also SpinLock for a version that permits recursion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RecursiveSpinLock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is auto-generated, I just forgot to rerun the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is but I cba to find the actual source file :)
abstract AbstractLock | ||
|
||
# Test-and-test-and-set spin locks are quickest up to about 30ish | ||
# contending threads. If you have more contention than that, perhaps | ||
# a lock is the wrong way to synchronize. | ||
""" | ||
SpinLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is how this is expected to be called, why isn't it typealias'ed the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure but you're adding the docstring now, and chose to use the typealias name in the docstring instead of the actual type name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment? I want to potentially have multiple implementations of SpinLock
, selecting a particular one depending on platform. But I want user code to use just SpinLock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then the docstring should be for spinlock as written, and actually attached to the spinlock name instead of the tataslock name
|
||
function setindex!{K}(wkh::WeakKeyDict{K}, v, key) | ||
k = convert(K, key) | ||
finalizer(k, wkh.finalizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collapsed but still applicable:
so this may or may not add the finalizer to the input depending on whether the convert aliased?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the point of it being a WeakKeyDict
also use this `client_refs.lock` to protect other data-structures from being interrupted by finalizers, in the multi.jl logic we may want to start indicating which mutable data-structures are safe to call from finalizers, since generally that isn't possible to make a finalizer API gc-safe, that code should observe the standard thread-safe restrictions (there's no guarantee of which thread it'll run on), plus, if the data-structures uses locks for synchronization, use the `islocked` pattern (demonstrated herein) in the `finalizer` to re-schedule the finalizer when the mutable data-structure is not available for mutation. this ensures that the lock cannot be acquired recursively, and furthermore, this pattern will continue to work if finalizers get moved to their own separate thread. close #14445 fix #16550 reverts workaround #14456 (shouldn't break #14295, due to new locks) should fix #16091 (with #17619)
also use this `client_refs.lock` to protect other data-structures from being interrupted by finalizers, in the multi.jl logic we may want to start indicating which mutable data-structures are safe to call from finalizers, since generally that isn't possible to make a finalizer API gc-safe, that code should observe the standard thread-safe restrictions (there's no guarantee of which thread it'll run on), plus, if the data-structures uses locks for synchronization, use the `islocked` pattern (demonstrated herein) in the `finalizer` to re-schedule the finalizer when the mutable data-structure is not available for mutation. this ensures that the lock cannot be acquired recursively, and furthermore, this pattern will continue to work if finalizers get moved to their own separate thread. close #14445 fix #16550 reverts workaround #14456 (shouldn't break #14295, due to new locks) should fix #16091 (with #17619) (cherry picked from commit cd8be65) ref #16204
The finalizers forAbstractRemoteRef
were corrupting the identity of those objects. By not doing that, I think this may provide a fix for some of the recent parallel bugs. Along the way, I implemented making Dict finalizer-safe (while I thought that might be related to the currrent bug).old commits: 9b5fa5d, feaf59f, 8d1970e
this makes WeakKeyDict finalizer usage gc-safe, which should fix numerous testing issues from
client_ref
access race-conditions