Skip to content

gh-96387: take_gil() resets drop request before exit #96869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2022
Merged

gh-96387: take_gil() resets drop request before exit #96869

merged 1 commit into from
Sep 19, 2022

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Sep 16, 2022

At Python exit, sometimes a thread holding the GIL can wait forever for a thread (usually a daemon thread) which requested to drop the GIL, whereas the thread already exited. To fix the race condition, the thread which requested the GIL drop now resets its request before exiting.

take_gil() now calls RESET_GIL_DROP_REQUEST() before PyThread_exit_thread() if it called SET_GIL_DROP_REQUEST to fix a race condition with drop_gil().

Issue discovered and analyzed by Mingliang ZHAO.

At Python exit, sometimes a thread holding the GIL can wait forever
for a thread (usually a daemon thread) which requested to drop the
GIL, whereas the thread already exited. To fix the race condition,
the thread which requested the GIL drop now resets its request before
exiting.

take_gil() now calls RESET_GIL_DROP_REQUEST() before
PyThread_exit_thread() if it called SET_GIL_DROP_REQUEST to fix a
race condition with drop_gil().

Issue discovered and analyzed by Mingliang ZHAO.
@vstinner
Copy link
Member Author

@gpshead: Would you mind to review this change?

// block forever waiting for the thread which exited. Drop
// requests made by other threads are also reset: these threads
// may have to request again a drop request (iterate one more
// time).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bugfix can slowdown the Python shutdown by a few "switch interval" seconds (5 ms by default). I expect that the worst case is when all daemon threads need one more iterations, so interval x number of threads: like 500 ms for 100 threads. But with luck, it's way shorter because the thread holding the GIL can get the switch request (and so let other threads exit) before the request is reset ;-)

My expectation is that... nobody will notice, since daemon threads are rare, and this race condition is unlikely ;-)

If tomorrow it becomes a problem, maybe the "GIL drop request" should store the tstate of the thread requesting it, and the request should only be reset if it matchs the current tstate. Not sure if it will speed up the shutdown. Again, I expect that it will not really affect performance at the end ;-)

@vstinner
Copy link
Member Author

@pitrou: If you're around, it would be nice if you could have a look, since you designed the new Pyhon GIL ;-)

@python python deleted a comment from Kizzer415 Sep 19, 2022
@python python deleted a comment from Kizzer415 Sep 19, 2022
@vstinner vstinner merged commit 04f4977 into python:main Sep 19, 2022
@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @vstinner, I had trouble checking out the 3.11 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 04f4977f508583954ad7b9cb09076ee1e57461f8 3.11

@miss-islington
Copy link
Contributor

Sorry, @vstinner, I could not cleanly backport this to 3.10 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 04f4977f508583954ad7b9cb09076ee1e57461f8 3.10

@bedevere-bot
Copy link

GH-96941 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Sep 19, 2022
@vstinner vstinner added needs backport to 3.11 only security fixes and removed needs backport to 3.10 only security fixes labels Sep 19, 2022
@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @vstinner, I had trouble checking out the 3.11 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 04f4977f508583954ad7b9cb09076ee1e57461f8 3.11

@vstinner vstinner removed the needs backport to 3.11 only security fixes label Sep 19, 2022
vstinner added a commit that referenced this pull request Sep 20, 2022
At Python exit, sometimes a thread holding the GIL can wait forever
for a thread (usually a daemon thread) which requested to drop the
GIL, whereas the thread already exited. To fix the race condition,
the thread which requested the GIL drop now resets its request before
exiting.

take_gil() now calls RESET_GIL_DROP_REQUEST() before
PyThread_exit_thread() if it called SET_GIL_DROP_REQUEST to fix a
race condition with drop_gil().

Issue discovered and analyzed by Mingliang ZHAO.

(cherry picked from commit 04f4977)
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 20, 2022
…96869) (pythonGH-96941)

At Python exit, sometimes a thread holding the GIL can wait forever
for a thread (usually a daemon thread) which requested to drop the
GIL, whereas the thread already exited. To fix the race condition,
the thread which requested the GIL drop now resets its request before
exiting.

take_gil() now calls RESET_GIL_DROP_REQUEST() before
PyThread_exit_thread() if it called SET_GIL_DROP_REQUEST to fix a
race condition with drop_gil().

Issue discovered and analyzed by Mingliang ZHAO.

(cherry picked from commit 04f4977)
(cherry picked from commit 6ff5471)

Co-authored-by: Victor Stinner <vstinner@python.org>
miss-islington added a commit that referenced this pull request Sep 20, 2022
…6941)

At Python exit, sometimes a thread holding the GIL can wait forever
for a thread (usually a daemon thread) which requested to drop the
GIL, whereas the thread already exited. To fix the race condition,
the thread which requested the GIL drop now resets its request before
exiting.

take_gil() now calls RESET_GIL_DROP_REQUEST() before
PyThread_exit_thread() if it called SET_GIL_DROP_REQUEST to fix a
race condition with drop_gil().

Issue discovered and analyzed by Mingliang ZHAO.

(cherry picked from commit 04f4977)
(cherry picked from commit 6ff5471)

Co-authored-by: Victor Stinner <vstinner@python.org>
@gpshead gpshead added the needs backport to 3.11 only security fixes label Sep 20, 2022
@miss-islington
Copy link
Contributor

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @vstinner, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 04f4977f508583954ad7b9cb09076ee1e57461f8 3.11

@vstinner
Copy link
Member Author

The 3.11 backport is #96941

@hauntsaninja hauntsaninja removed the needs backport to 3.11 only security fixes label Dec 24, 2022
@rebx
Copy link

rebx commented Mar 10, 2024

Hi, I wanted to check if this needs to be backported to 3.9 as well? Many thanks

@itamaro
Copy link
Contributor

itamaro commented Mar 11, 2024

Hi, I wanted to check if this needs to be backported to 3.9 as well? Many thanks

3.9 is in security maintenance mode, and this doesn't look like a security-related patch, so probably no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants