Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118727: Don't drop the GIL in drop_gil() unless the current thread holds it #118745

Merged
merged 10 commits into from
May 23, 2024

Conversation

swtaarrs
Copy link
Member

@swtaarrs swtaarrs commented May 7, 2024

drop_gil() assumes that its caller is attached, which means that the current thread holds the GIL if and only if the GIL is enabled, and the enabled-state of the GIL won't change. This isn't true, though, because detach_thread() calls _PyEval_ReleaseLock() after detaching and _PyThreadState_DeleteCurrent() calls it after removing the current thread from consideration for stop-the-world requests (effectively detaching it).

Fix this by remembering whether or not a thread acquired the GIL when it last attached, in PyThreadState._status.holds_gil, and check this in drop_gil() instead of gil->enabled. As part of this, add an explicit thread_dying argument to drop_gil(), separating that concept from whether or not it's safe to dereference tstate.

This fixes a crash in test_multiprocessing_pool_circular_import(), so I've reenabled it.

Python/pystate.c Outdated Show resolved Hide resolved
@swtaarrs swtaarrs removed the 3.13 bugs and security fixes label May 7, 2024
@swtaarrs
Copy link
Member Author

swtaarrs commented May 8, 2024

I realized while writing up an explanation of my changes that the commit I just pushed isn't quite right. I changed drop_gil() to always take a non-NULL tstate by adding an explicit thread_dying argument. But in this case, tstate can be a dangling pointer so it's actually important to not dereference it. I'll come up with a different approach.

@swtaarrs swtaarrs marked this pull request as draft May 8, 2024 19:25
@swtaarrs swtaarrs marked this pull request as ready for review May 8, 2024 20:58
@swtaarrs
Copy link
Member Author

swtaarrs commented May 8, 2024

This should be ready now - I updated the description to reflect the changes.

@swtaarrs swtaarrs changed the title gh-118727: Don't drop the GIL during thread deletion unless the current thread holds it gh-118727: Don't drop the GIL in drop_gil() unless the current thread holds it May 8, 2024
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM. I think it's worth having tstate->_status.holds_gil in the default build as well.

Python/pystate.c Outdated Show resolved Hide resolved
Include/cpython/pystate.h Outdated Show resolved Hide resolved
Python/ceval_gil.c Outdated Show resolved Hide resolved
@swtaarrs swtaarrs added the needs backport to 3.13 bugs and security fixes label May 9, 2024
@swtaarrs
Copy link
Member Author

swtaarrs commented May 9, 2024

(moved from the issue where I accidentally posted this)

test_importlib got stuck in the tsan build again, which this should've fixed. I'll see what I can figure out.

@ericsnowcurrently
Copy link
Member

Does applying @suppress_immortalization() make a difference. I'm not suggesting that as a solution (there's probably a deeper issue to solve), but whether or not the decorator makes a difference might be helpful to know.

@colesbury
Copy link
Contributor

@swtaarrs - I observed a hang when running pool_in_threads.py locally with this PR. (It may or may not be the same as the TSan hang).

I think there's an issue with FORCE_SWITCHING and the dynamic enabling of the GIL:

  • A thread disables (and releases) the GIL via _PyEval_DisableGIL. If _PY_GIL_DROP_REQUEST_BIT is set, it waits in the FORCE_SWITCHING section until another thread acquires the GIL
  • A second thread acquires the GIL. However, it sees that the GIL is now disabled and releases the GIL without notifying the first thread

@swtaarrs
Copy link
Member Author

I think there's an issue with FORCE_SWITCHING and the dynamic enabling of the GIL

Thanks, good catch. I should probably pull drop_gil_impl() back out for use by _PyEval_DisableGIL() then. The FORCE_SWITCHING code doesn't apply once the GIL is disabled, and nothing else in drop_gil() applies to _PyEval_DisableGIL() either. I'll also take another look at everything at the end of take_gil() that's skipped in this case.

@swtaarrs
Copy link
Member Author

The commit I just pushed should address the FORCE_SWITCHING issue by reverting my drop_gil_impl() change. I also added a line to clear _PY_GIL_DROP_REQUEST_BIT when disabling the GIL, so that thread doesn't keep hitting its eval_breaker once it resumes executing.

I'm running the tsan tests in a loop locally to see if it hangs again.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@colesbury
Copy link
Contributor

@ericsnowcurrently - would you like to re-review this or should I go ahead and merge it?

@ericsnowcurrently
Copy link
Member

I'll take a quick look tomorrow.

@swtaarrs
Copy link
Member Author

@ericsnowcurrently will you have time to take a look today?

@ericsnowcurrently
Copy link
Member

Sorry for the delay. I don't have any objections. If you both are confident about the solution then I'm on board. 😄

@colesbury
Copy link
Contributor

Great! As a heads up, there is a different bug that affects the same test case (#119369).

@colesbury colesbury merged commit be1dfcc into python:main May 23, 2024
36 checks passed
@miss-islington-app
Copy link

Thanks @swtaarrs for the PR, and @colesbury for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 23, 2024
…t thread holds it (pythonGH-118745)

`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).

Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.

This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
(cherry picked from commit be1dfcc)

Co-authored-by: Brett Simmers <swtaarrs@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented May 23, 2024

GH-119474 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label May 23, 2024
colesbury pushed a commit that referenced this pull request May 23, 2024
…nt thread holds it (GH-118745) (#119474)

`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).

Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.

This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
(cherry picked from commit be1dfcc)

Co-authored-by: Brett Simmers <swtaarrs@users.noreply.github.com>
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
…t thread holds it (python#118745)

`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).

Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.

This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) skip news topic-free-threading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants