Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threading: fixup scheduler statepoints for GC #32238

Merged
merged 5 commits into from
Jun 10, 2019
Merged

threading: fixup scheduler statepoints for GC #32238

merged 5 commits into from
Jun 10, 2019

Conversation

JeffBezanson
Copy link
Member

We basically did not have a GC safepoint in some paths that could lead to a thread blocking. This should fix the issue noticed in #32217 (comment). Commits can be squashed.

@JeffBezanson JeffBezanson added multithreading Base.Threads and related functionality GC Garbage collector bugfix This change fixes an existing bug labels Jun 4, 2019
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuyichao is this right/sufficient?

@yuyichao
Copy link
Contributor

yuyichao commented Jun 4, 2019

Which one is the fix? Lockwise there seems to be fewer safepoints in the loop?

@vtjnash
Copy link
Member

vtjnash commented Jun 4, 2019

There's now a statepoint (well, two) in every get_task_next call, instead of just one at the top of the loop. The missing one per the issue comment is the one just before the blocking uv_run call. The other part of the fix is ensuring that we don't run the Julia code callback from inside the region marked gc-safe.

src/partr.c Outdated
@@ -481,16 +484,18 @@ JL_DLLEXPORT jl_task_t *jl_task_get_next(jl_value_t *getsticky)
}
}
// the other threads will just wait for on signal to resume
int8_t gc_state = jl_gc_safe_enter(ptls);
uv_mutex_lock(&ptls->sleep_lock);
while (jl_atomic_load(&sleep_check_state) == sleeping) {
task = get_next_task(getsticky);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call also can't happen while holding the sleep lock (because it contains gc allocations and/or safepoints) and/or we cannot attempt to wake up all the other threads to get them to help with the gc effort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in this case assuming get_next_task can call julia code/trigger GC then the uv_mutex_lock(&ptls->sleep_lock); must be done in GC safe region. Alternatively, all holders of the lock must not trigger any GC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me that we ought to be able to check for available tasks without triggering GC. Of course, that's hard to guarantee from julia code.

src/partr.c Outdated
@@ -481,16 +484,18 @@ JL_DLLEXPORT jl_task_t *jl_task_get_next(jl_value_t *getsticky)
}
}
// the other threads will just wait for on signal to resume
int8_t gc_state = jl_gc_safe_enter(ptls);
uv_mutex_lock(&ptls->sleep_lock);
while (jl_atomic_load(&sleep_check_state) == sleeping) {
task = get_next_task(getsticky);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in this case assuming get_next_task can call julia code/trigger GC then the uv_mutex_lock(&ptls->sleep_lock); must be done in GC safe region. Alternatively, all holders of the lock must not trigger any GC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug GC Garbage collector multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants