Skip to content

Conversation

rostan-t
Copy link

@rostan-t rostan-t commented Sep 5, 2025

Description

Increase gilstate_counter before calling PyThreadState_Clear in gil_scoped_acquire to prevent finalizers to try to create and destroy a new thread state while the current thread state is being cleared.

Fixes #5827

Suggested changelog entry:

Fix crash that can occur when finalizers acquire and release the GIL.

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
@rostan-t rostan-t force-pushed the issue5827-gilscopedacquire branch from d2863f5 to c11b22e Compare September 5, 2025 12:18
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added this a couple days ago (PR #5822), to git rid of very frequent failures under macOS / free-threaded that nobody had the time to debug:

if env.MACOS:
if not env.sys_is_gil_enabled():
pytest.xfail(f"[TEST-GIL-SCOPED] {soabi} with GIL disabled: " + msg)
if env.PY_GIL_DISABLED:
pytest.xfail(f"[TEST-GIL-SCOPED] {soabi}: " + msg)

WDYT about removing that in this PR? — So that we see immediately if your fix helps with that, too. If not, we can put that code back in a follow-on PR.

Comment on lines +18 to +22
#if defined(PYBIND11_CPP20) && defined(__has_include) && __has_include(<barrier>)
# define PYBIND11_HAS_BARRIER 1
# include <barrier>
#endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this somewhere around here?

I think PYBIND11_HAS_STD_BARRIER would be ideal (similar to the PYBIND11_HAS_STD_LAUNDER that we have already).

# endif
// Make sure that PyThreadState_Clear is not recursively called by finalizers.
// See issue #5827
++tstate->gilstate_counter;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally not sure: Is there any point in guarding this with #if defined(...) for certain Python versions and Py_GIL_DISABLED?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused on free-threading here but I think that the issue can also occur in destructors of thread-local variables in the regular build.

});
#endif

m.attr("has_barrier") =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places we use a pattern that would be defined_PYBIND11_HAS_STD_BARRIER here.

(It makes it a little easier to pin-point all things connected to PYBIND11_HAS_STD_BARRIER across Python and C++ code.)

@colesbury
Copy link
Contributor

FWIW, this LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Crash in free-threaded Python when destructors acquire and release the GIL
3 participants