Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make WeakKeyDict finalizers thread/async-safe #16204

Merged
merged 4 commits into from
Aug 8, 2016
Merged

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented May 4, 2016

The finalizers for AbstractRemoteRef were corrupting the identity of those objects. By not doing that, I think this may provide a fix for some of the recent parallel bugs. Along the way, I implemented making Dict finalizer-safe (while I thought that might be related to the currrent bug).
old commits: 9b5fa5d, feaf59f, 8d1970e

this makes WeakKeyDict finalizer usage gc-safe, which should fix numerous testing issues from client_ref access race-conditions

else if (prev && !on) {
// enable -> disable
jl_atomic_fetch_add(&jl_finalizers_disabled_counter, 1);
// TODO: check if finalizers are running and wait for them to finish
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? Since finalizer runs can nest and can run on multiple threads, I feel like waiting for all finalizers to finish is asking for dead lock. (e.g. if this is called by a finalizer on two different threads because both of them modifies a dict).

For avoiding corruption with multithreading it is still necessary to use explicit locks. It is only the case if the finalizer runs on the same thread that you can't really protect with a lock and requires this API to disable the finalizer on the same thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another word, I think we probably only need a thread local disable_finalizers and the caller of this function is expected to do the necessary thread synchronization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to think about ways of localizing finalizers to the appropriate thread. technically though, this lock condition might only need to control access to the finalizer list, but would not need to be held while running a finalizer.

if we stop running finalizers on the same thread, then I guess the more standard implementation would be to swap this for explicit internal Dict locks. but currently, I think we would need both (to avoid having the finalizer try to grab the non-recursive lock we are already holding and deadlock).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether we need a Dict lock or not depends on what's the thread safety guarantee we provide for stdlib. Currently, the user always has to synchronize access to a dict from multiple threads and there's nothing special about whether it is running from within a finalizer or not and the only feature Base has to provide is that finalizer running on the same thread will not corrupt the data structure.

If, however, we begin to guarantee thread safety of Base data structures it would indeed be better to use that for finalizer synchronization too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. but I think when we do that I think we either need to make finalize-disable part of acquiring a lock, or run all finalizers on a separate thread. for now, I've been tackling those concerns independently. I can understand the argument though that multi.jl might be responsible for synchronizing the Dict access to client_refs. Somehow WeakKeyDict would need to be compliant and use the same lock.

@tkelman
Copy link
Contributor

tkelman commented May 9, 2016

is this branch/pr still necessary after you pushed part of it to master? if not, please close the pr and delete the branch

@vtjnash vtjnash changed the title dictchannel maybe fix WIP: make Dict async-/finalizer-safe May 10, 2016
yuyichao referenced this pull request Jun 22, 2016
@yuyichao indicated the thread-safety issues and the skeleton of
a solution, and explained the need to keep references to BigInt
before ccall, and suggested to add a field at the end of MPZ.
He also pointed at the GC un-safety of the previous solution
due to the fact that finalizers can call arbitrary code. @ScottPJones
suggested to disable the GC to address this specific problem.
@vtjnash vtjnash changed the title WIP: make Dict async-/finalizer-safe make WeakKeyDict finalizers thread/async-safe Jul 26, 2016
@vtjnash
Copy link
Member Author

vtjnash commented Jul 26, 2016

PR re-written to synchronize only WeakKeyDict, and re-schedule finalizers rather than disabling them

(updated PR based on #17590)


Test-and-test-and-set spin locks are quickest up to about 30ish contending threads. If you have more contention than that, perhaps a lock is the wrong way to synchronize.

See also SpinLock for a version that permits recursion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecursiveSpinLock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is auto-generated, I just forgot to rerun the script.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is but I cba to find the actual source file :)

abstract AbstractLock

# Test-and-test-and-set spin locks are quickest up to about 30ish
# contending threads. If you have more contention than that, perhaps
# a lock is the wrong way to synchronize.
"""
SpinLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is how this is expected to be called, why isn't it typealias'ed the other way around?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added by @kpamnany in f3402da

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure but you're adding the docstring now, and chose to use the typealias name in the docstring instead of the actual type name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment? I want to potentially have multiple implementations of SpinLock, selecting a particular one depending on platform. But I want user code to use just SpinLock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then the docstring should be for spinlock as written, and actually attached to the spinlock name instead of the tataslock name


function setindex!{K}(wkh::WeakKeyDict{K}, v, key)
k = convert(K, key)
finalizer(k, wkh.finalizer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collapsed but still applicable:

so this may or may not add the finalizer to the input depending on whether the convert aliased?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the point of it being a WeakKeyDict

vtjnash added 4 commits August 5, 2016 02:28
also use this `client_refs.lock` to protect other data-structures from
being interrupted by finalizers, in the multi.jl logic

we may want to start indicating which mutable data-structures are safe
to call from finalizers, since generally that isn't possible

to make a finalizer API gc-safe, that code should observe the standard
thread-safe restrictions (there's no guarantee of which thread it'll run on),

plus, if the data-structures uses locks for synchronization,
use the `islocked` pattern (demonstrated herein) in the `finalizer`
to re-schedule the finalizer when the mutable data-structure is not
available for mutation.
this ensures that the lock cannot be acquired recursively,
and furthermore, this pattern will continue to work if finalizers
get moved to their own separate thread.

close #14445
fix #16550
reverts workaround #14456 (shouldn't break #14295, due to new locks)
should fix #16091 (with #17619)
@vtjnash vtjnash merged commit 189a604 into master Aug 8, 2016
@vtjnash vtjnash deleted the jn/dictchannel branch August 8, 2016 18:52
tkelman pushed a commit that referenced this pull request Aug 11, 2016
tkelman pushed a commit that referenced this pull request Aug 11, 2016
tkelman pushed a commit that referenced this pull request Aug 11, 2016
tkelman pushed a commit that referenced this pull request Aug 11, 2016
also use this `client_refs.lock` to protect other data-structures from
being interrupted by finalizers, in the multi.jl logic

we may want to start indicating which mutable data-structures are safe
to call from finalizers, since generally that isn't possible

to make a finalizer API gc-safe, that code should observe the standard
thread-safe restrictions (there's no guarantee of which thread it'll run on),

plus, if the data-structures uses locks for synchronization,
use the `islocked` pattern (demonstrated herein) in the `finalizer`
to re-schedule the finalizer when the mutable data-structure is not
available for mutation.
this ensures that the lock cannot be acquired recursively,
and furthermore, this pattern will continue to work if finalizers
get moved to their own separate thread.

close #14445
fix #16550
reverts workaround #14456 (shouldn't break #14295, due to new locks)
should fix #16091 (with #17619)

(cherry picked from commit cd8be65)
ref #16204
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants