make WeakKeyDict finalizers thread/async-safe #16204

vtjnash · 2016-05-04T23:31:13Z

The finalizers for AbstractRemoteRef were corrupting the identity of those objects. By not doing that, I think this may provide a fix for some of the recent parallel bugs. Along the way, I implemented making Dict finalizer-safe (while I thought that might be related to the currrent bug).
old commits: 9b5fa5d, feaf59f, 8d1970e

this makes WeakKeyDict finalizer usage gc-safe, which should fix numerous testing issues from client_ref access race-conditions

yuyichao · 2016-05-05T01:18:21Z

src/gc.c

+    else if (prev && !on) {
+        // enable -> disable
+        jl_atomic_fetch_add(&jl_finalizers_disabled_counter, 1);
+        // TODO: check if finalizers are running and wait for them to finish


Is this really necessary? Since finalizer runs can nest and can run on multiple threads, I feel like waiting for all finalizers to finish is asking for dead lock. (e.g. if this is called by a finalizer on two different threads because both of them modifies a dict).

For avoiding corruption with multithreading it is still necessary to use explicit locks. It is only the case if the finalizer runs on the same thread that you can't really protect with a lock and requires this API to disable the finalizer on the same thread.

In another word, I think we probably only need a thread local disable_finalizers and the caller of this function is expected to do the necessary thread synchronization.

I think we may need to think about ways of localizing finalizers to the appropriate thread. technically though, this lock condition might only need to control access to the finalizer list, but would not need to be held while running a finalizer.

if we stop running finalizers on the same thread, then I guess the more standard implementation would be to swap this for explicit internal Dict locks. but currently, I think we would need both (to avoid having the finalizer try to grab the non-recursive lock we are already holding and deadlock).

Whether we need a Dict lock or not depends on what's the thread safety guarantee we provide for stdlib. Currently, the user always has to synchronize access to a dict from multiple threads and there's nothing special about whether it is running from within a finalizer or not and the only feature Base has to provide is that finalizer running on the same thread will not corrupt the data structure.

If, however, we begin to guarantee thread safety of Base data structures it would indeed be better to use that for finalizer synchronization too.

true. but I think when we do that I think we either need to make finalize-disable part of acquiring a lock, or run all finalizers on a separate thread. for now, I've been tackling those concerns independently. I can understand the argument though that multi.jl might be responsible for synchronizing the Dict access to client_refs. Somehow WeakKeyDict would need to be compliant and use the same lock.

tkelman · 2016-05-09T06:20:32Z

is this branch/pr still necessary after you pushed part of it to master? if not, please close the pr and delete the branch

@yuyichao

@yuyichao indicated the thread-safety issues and the skeleton of a solution, and explained the need to keep references to BigInt before ccall, and suggested to add a field at the end of MPZ. He also pointed at the GC un-safety of the previous solution due to the fact that finalizers can call arbitrary code. @ScottPJones suggested to disable the GC to address this specific problem.

vtjnash · 2016-07-26T00:33:48Z

PR re-written to synchronize only WeakKeyDict, and re-schedule finalizers rather than disabling them

(updated PR based on #17590)

KristofferC · 2016-07-26T00:42:39Z

doc/stdlib/parallel.rst

+
+   Test-and-test-and-set spin locks are quickest up to about 30ish contending threads. If you have more contention than that, perhaps a lock is the wrong way to synchronize.
+
+   See also SpinLock for a version that permits recursion.


RecursiveSpinLock

This is auto-generated, I just forgot to rerun the script.

I know it is but I cba to find the actual source file :)

tkelman · 2016-07-26T15:43:25Z

base/locks.jl

 abstract AbstractLock

 # Test-and-test-and-set spin locks are quickest up to about 30ish
 # contending threads. If you have more contention than that, perhaps
 # a lock is the wrong way to synchronize.
+"""
+    SpinLock()


if this is how this is expected to be called, why isn't it typealias'ed the other way around?

Added by @kpamnany in f3402da

sure but you're adding the docstring now, and chose to use the typealias name in the docstring instead of the actual type name?

I don't understand this comment? I want to potentially have multiple implementations of SpinLock, selecting a particular one depending on platform. But I want user code to use just SpinLock.

then the docstring should be for spinlock as written, and actually attached to the spinlock name instead of the tataslock name

tkelman · 2016-08-04T11:35:57Z

base/weakkeydict.jl

+
+function setindex!{K}(wkh::WeakKeyDict{K}, v, key)
+    k = convert(K, key)
+    finalizer(k, wkh.finalizer)


collapsed but still applicable:

so this may or may not add the finalizer to the input depending on whether the convert aliased?

yes, that's the point of it being a WeakKeyDict

ref comment in #15923

also use this `client_refs.lock` to protect other data-structures from being interrupted by finalizers, in the multi.jl logic we may want to start indicating which mutable data-structures are safe to call from finalizers, since generally that isn't possible to make a finalizer API gc-safe, that code should observe the standard thread-safe restrictions (there's no guarantee of which thread it'll run on), plus, if the data-structures uses locks for synchronization, use the `islocked` pattern (demonstrated herein) in the `finalizer` to re-schedule the finalizer when the mutable data-structure is not available for mutation. this ensures that the lock cannot be acquired recursively, and furthermore, this pattern will continue to work if finalizers get moved to their own separate thread. close #14445 fix #16550 reverts workaround #14456 (shouldn't break #14295, due to new locks) should fix #16091 (with #17619)

ref comment in #15923 (cherry picked from commit 32b58e1) ref #16204

(cherry picked from commit 609957f) ref #16204

(cherry picked from commit bbd4bcc) ref #16204

also use this `client_refs.lock` to protect other data-structures from being interrupted by finalizers, in the multi.jl logic we may want to start indicating which mutable data-structures are safe to call from finalizers, since generally that isn't possible to make a finalizer API gc-safe, that code should observe the standard thread-safe restrictions (there's no guarantee of which thread it'll run on), plus, if the data-structures uses locks for synchronization, use the `islocked` pattern (demonstrated herein) in the `finalizer` to re-schedule the finalizer when the mutable data-structure is not available for mutation. this ensures that the lock cannot be acquired recursively, and furthermore, this pattern will continue to work if finalizers get moved to their own separate thread. close #14445 fix #16550 reverts workaround #14456 (shouldn't break #14295, due to new locks) should fix #16091 (with #17619) (cherry picked from commit cd8be65) ref #16204

yuyichao reviewed May 5, 2016
View reviewed changes

vtjnash changed the title ~~dictchannel maybe fix~~ WIP: make Dict async-/finalizer-safe May 10, 2016

yuyichao mentioned this pull request Jun 1, 2016

Fix inference recursion due to finalizers #16699

Merged

yuyichao mentioned this pull request Jun 18, 2016

segmentation fault when making many remotecalls, possibly related to shared arrays #16880

Closed

rfourquet mentioned this pull request Jun 22, 2016

[WIP] implement BigInt (and BigFloat?) with native Array #17015

Closed

vtjnash force-pushed the jn/dictchannel branch from 8d1970e to 6478db7 Compare July 26, 2016 00:29

vtjnash changed the title ~~WIP: make Dict async-/finalizer-safe~~ make WeakKeyDict finalizers thread/async-safe Jul 26, 2016

KristofferC reviewed Jul 26, 2016
View reviewed changes

vtjnash force-pushed the jn/dictchannel branch from 6478db7 to 7f1cad4 Compare July 26, 2016 15:33

tkelman reviewed Jul 26, 2016
View reviewed changes

vtjnash mentioned this pull request Jul 28, 2016

RFC: Make isapprox compare element-wise when the vector norm is infinite #17658

Merged

vtjnash force-pushed the jn/dictchannel branch from 7f1cad4 to 30a6f8b Compare August 4, 2016 04:12

tkelman reviewed Aug 4, 2016
View reviewed changes

vtjnash added the backport pending 0.5 label Aug 4, 2016

vtjnash added 4 commits August 5, 2016 02:28

improve efficiency of Dict and accuracy of age (nee dirty) flag

32b58e1

ref comment in #15923

split weakkeydict into a separate file

609957f

move Thread support much earlier in bootstrapping

bbd4bcc

vtjnash force-pushed the jn/dictchannel branch from 30a6f8b to cd8be65 Compare August 5, 2016 06:38

vtjnash merged commit 189a604 into master Aug 8, 2016

vtjnash deleted the jn/dictchannel branch August 8, 2016 18:52

tkelman pushed a commit that referenced this pull request Aug 11, 2016

improve efficiency of Dict and accuracy of age (nee dirty) flag

3d9c3d1

ref comment in #15923 (cherry picked from commit 32b58e1) ref #16204

tkelman pushed a commit that referenced this pull request Aug 11, 2016

split weakkeydict into a separate file

25c23b7

(cherry picked from commit 609957f) ref #16204

tkelman pushed a commit that referenced this pull request Aug 11, 2016

move Thread support much earlier in bootstrapping

7db1cd9

(cherry picked from commit bbd4bcc) ref #16204

tkelman removed the backport pending 0.5 label Aug 12, 2016

andyferris mentioned this pull request Sep 23, 2017

Removed redundant code from get! for Dict #23841

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make WeakKeyDict finalizers thread/async-safe #16204

make WeakKeyDict finalizers thread/async-safe #16204

vtjnash commented May 4, 2016 •

edited

Loading

yuyichao May 5, 2016

yuyichao May 5, 2016

vtjnash May 5, 2016

yuyichao May 5, 2016

vtjnash May 5, 2016

tkelman commented May 9, 2016

vtjnash commented Jul 26, 2016

KristofferC Jul 26, 2016

vtjnash Jul 26, 2016

KristofferC Jul 26, 2016

tkelman Jul 26, 2016

vtjnash Jul 26, 2016

tkelman Jul 26, 2016

kpamnany Jul 27, 2016

tkelman Jul 27, 2016

tkelman Aug 4, 2016

vtjnash Aug 5, 2016


		Test-and-test-and-set spin locks are quickest up to about 30ish contending threads. If you have more contention than that, perhaps a lock is the wrong way to synchronize.

		See also SpinLock for a version that permits recursion.

make WeakKeyDict finalizers thread/async-safe #16204

make WeakKeyDict finalizers thread/async-safe #16204

Conversation

vtjnash commented May 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkelman commented May 9, 2016

vtjnash commented Jul 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash commented May 4, 2016 •

edited

Loading