Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed assertion in Python/legacy_tracing.c:431 on a free-threading build #127235

Open
devdanzin opened this issue Nov 24, 2024 · 3 comments
Open
Labels
topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@devdanzin
Copy link
Contributor

devdanzin commented Nov 24, 2024

Crash report

What happened?

It's possible to cause an abort on !_PyMem_IsPtrFreed(tstate) while running with PYTHON_GIL=0 by calling the following code:

import threading

def errback(*args, **kw):
    raise ValueError('error')

for x in range(200):
    threading._start_joinable_thread(errback)
    try:
        threading.setprofile_all_threads("")
    except:
        pass

Abort message:

python: Python/legacy_tracing.c:431: is_tstate_valid: Assertion `!_PyMem_IsPtrFreed(tstate)' failed.
Aborted

Not sure this isn't an expected failure mode.
Found using fusil by @vstinner.

CPython versions tested on:

3.13, 3.14, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a2+ experimental free-threading build (heads/main-dirty:a13e94d84bf, Nov 23 2024, 07:16:19) [GCC 11.4.0]

@devdanzin devdanzin added the type-crash A hard crash of the interpreter, possibly with a core dump label Nov 24, 2024
@ZeroIntensity
Copy link
Member

Does this occur without use of the private _start_joinable_thread?

@colesbury
Copy link
Contributor

colesbury commented Nov 25, 2024

I'm pretty sure that this is not limited to free threading. It's not safe to access other PyThreadStates without holding the runtime lock because those threads may exit. I think that we release the lock here because of reentrancy/deadlock concerns, but it's still not safe.

The reproducers involving the GIL are slightly more complicated because PyThreadState_DeleteCurrent is called with the GIL, which prevents most of these crashes. However, PyThreadState_Delete is called without the GIL and can lead to the crash.

cpython/Python/ceval.c

Lines 2402 to 2421 in 3e7ce6e

void
PyEval_SetProfileAllThreads(Py_tracefunc func, PyObject *arg)
{
PyThreadState *this_tstate = _PyThreadState_GET();
PyInterpreterState* interp = this_tstate->interp;
_PyRuntimeState *runtime = &_PyRuntime;
HEAD_LOCK(runtime);
PyThreadState* ts = PyInterpreterState_ThreadHead(interp);
HEAD_UNLOCK(runtime);
while (ts) {
if (_PyEval_SetProfile(ts, func, arg) < 0) {
PyErr_FormatUnraisable("Exception ignored in PyEval_SetProfileAllThreads");
}
HEAD_LOCK(runtime);
ts = PyThreadState_Next(ts);
HEAD_UNLOCK(runtime);
}
}

I think we have various reports of the same underlying issue. This issue is a nice reproducer for free threading, though.

For example:

@ZeroIntensity
Copy link
Member

Ok, I've pondered this issue for a day or so and I think that the simplest solution for this specific case is to just wait for the per-interpreter thread state lock (GH-127115) to get merged. HEAD_LOCK tends to be annoyingly reentrant, so by just switching to that, we should be able to fix any unsafe usage because it shouldn't be nearly as reentrancy prone. (FWIW, I'm somewhat in favor of making the runtime lock recursive with how much of a PITA it is to use, but whatever.)

The much bigger issue at hand here is that PyThreadState_Next and friends are public APIs, so they aren't able to HEAD_LOCK. Somehow, we need to lock inside of those functions rather than relying on the caller to lock the runtime. I don't think there's anything we can do about PyThreadState_Next, because even if we somehow got thread states in a thread-safe way internally, and then magically made it safe to use the thread state without worrying about concurrent modifications (which would be impossible, unless we did something silly like copying the thread state and returning that), there's nothing stopping the list of thread states from being changed in between calls, so it could possibly skip over entries or return duplicates. Our best shot is probably to just deprecate PyThreadState_Next and invent something new and safer for users.

Some other functions we need to worry about in terms of thread safety:

  • PyInterpreterState_ThreadHead: Same issue as PyThreadState_Next, pretty much.
  • PyGILState_GetThisThreadState: I'm not sure how this will work with subinterpreters, because apparently PyGILState doesn't play nicely with them. I could imagine some interesting edge cases coming up related to locking there because apparently GILState checks are disabled if subinterpreters are active.
  • PyThreadState_GetDict: Returns an evil borrowed reference. If the thread state gets switched out, the user can still hold a reference to that dictionary. Even worse, on a GILful build, trying to use that dictionary without having the thread state attached is just not thread safe at all if you try to access it from a different GIL.
  • PyThreadState_GetFrame: Not borrowed, but still has the object thread safety issue with the cross-GIL that I mentioned above. I think for both of these functions it's probably fine to just mention in the docs that you can only use the references while the thread state is attached.
  • PyThreadState_Swap: Not inherently dangerous, but could lead to misue that we should document. This returns a not-attached thread state, which is ripe for abuse by any of the other PyThreadState APIs. (Also, what happens if e.g. the interpreter starts to exit while its detached?)

For the future, I think we should probably steer clear of any support for using a detached thread state. Perhaps we could invent some new API (PyThreadStateView?) that has more clear thread safety requirements and push that for users.

TL;DR we probably need to document and deprecate some non-thread-safe thread state APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

3 participants