Skip to content

gh-96387: take_gil() resets drop request before exit #96869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
At Python exit, sometimes a thread holding the GIL can wait forever for a
thread (usually a daemon thread) which requested to drop the GIL, whereas
the thread already exited. To fix the race condition, the thread which
requested the GIL drop now resets its request before exiting. Issue
discovered and analyzed by Mingliang ZHAO. Patch by Victor Stinner.
11 changes: 11 additions & 0 deletions Python/ceval_gil.c
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,7 @@ take_gil(PyThreadState *tstate)
goto _ready;
}

int drop_requested = 0;
while (_Py_atomic_load_relaxed(&gil->locked)) {
unsigned long saved_switchnum = gil->switch_number;

Expand All @@ -384,11 +385,21 @@ take_gil(PyThreadState *tstate)
{
if (tstate_must_exit(tstate)) {
MUTEX_UNLOCK(gil->mutex);
// gh-96387: If the loop requested a drop request in a previous
// iteration, reset the request. Otherwise, drop_gil() can
// block forever waiting for the thread which exited. Drop
// requests made by other threads are also reset: these threads
// may have to request again a drop request (iterate one more
// time).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bugfix can slowdown the Python shutdown by a few "switch interval" seconds (5 ms by default). I expect that the worst case is when all daemon threads need one more iterations, so interval x number of threads: like 500 ms for 100 threads. But with luck, it's way shorter because the thread holding the GIL can get the switch request (and so let other threads exit) before the request is reset ;-)

My expectation is that... nobody will notice, since daemon threads are rare, and this race condition is unlikely ;-)

If tomorrow it becomes a problem, maybe the "GIL drop request" should store the tstate of the thread requesting it, and the request should only be reset if it matchs the current tstate. Not sure if it will speed up the shutdown. Again, I expect that it will not really affect performance at the end ;-)

if (drop_requested) {
RESET_GIL_DROP_REQUEST(interp);
}
PyThread_exit_thread();
}
assert(is_tstate_valid(tstate));

SET_GIL_DROP_REQUEST(interp);
drop_requested = 1;
}
}

Expand Down