-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-126434: Use multiprocessing.Value for multiprocessing.Event to avoid deadlock when there is reentrant usage of set
from is_set
, e.g. when handling system signals
#126437
base: main
Are you sure you want to change the base?
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM
BTW this PR title need be changed
Misc/NEWS.d/next/Library/2024-11-05-11-11-29.gh-issue-126434.Hw3wdI.rst
Outdated
Show resolved
Hide resolved
I've changed the title @Zheaoli . What do you think? Thanks again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Misc/NEWS.d/next/Library/2024-11-05-11-11-29.gh-issue-126434.Hw3wdI.rst
Outdated
Show resolved
Hide resolved
This patch is failed in free threading TSAN test. need to figure out the root cause |
This test seems to fail in other PRs as well, I restarted the job again to see if it fails again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Event
implementation is not correct with an RLock because that allows reentrancy (from the same thread) during operations that are not written to be reentrant. For example:
def is_set(self):
with self._cond:
if self._flag.acquire(False):
self._flag.release()
return True
return False
After the self._flag.acquire(False)
before the self._flag.release()
, reentrant is_set()
calls will incorrectly appear to be false.
Event
is not written to be safely called from a Python signal handler.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
If this is the origin design behavior. Should we update the docs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a test, anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs rewriting to describe the current approach. Also, please update the PR title and description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've updated the description now.
What do you think @gpshead ?
Thanks for the feedback from all of you @colesbury, @gpshead, @Zheaoli @tomasr8 and @ZeroIntensity . I agree with the criticism that my initial suggestion for I've written a new implementation. I believe this new approach solves the deadlock described in the original issue: a reentrant use of I also added some asserts. Instead of a deadlock, an assert will now be raised. What do you think about this? Personally I'd rather have an exception than a deadlock. @ZeroIntensity How would you write a test for such a concurrency issue? I can only think of using some sort of metaprogramming, but at the moment I don't know enough Python to do it. |
…to avoid deadlock when there is reentrant usage of `set` from `is_set`, e.g. when handling system signals
516c663
to
e5302e6
Compare
set
from is_set
, e.g. when handling system signals
Here, you would run the reproducer from gh-126434 in a subprocess and check the return code + stdout/stderr. You can do this pretty easily with subprocess. As a side note, don't force push--we squash everything down to one commit at the end anyway. Force pushing just makes it harder to keep track of what's going on. |
What is the cost of not switching to The newly pushed code will detect and raise an exception when Here is an issue describing this The newly pushed code also solves #95826 The root cause here was that I've added tests so that these three issues are covered. Trying to CC users of referred issues should they still be interested in this topic: @salvatoreingala @marcos4758 @mcclurem @mdboom Thanks and kind regards. |
We're breaking users relying on the private API (which is their fault for doing, but it's not exactly encouraging...), and this implementation doesn't seem correct, because |
…e called from a signal handler in any combination.
Thanks again for your time and comments @ZeroIntensity .
Right, that is fair enough. I've explicitly taken the locks now. My understanding though was that writes and reads were atomic, but operations involving a read and a write, e.g. an increment, were not. Please correct me if I'm wrong. Source for this claim. Anyhow: locks are now taken explicitly. I rewrote the implementation now. My claim is that the new implementation is completely reentrant, i.e. it is safe to call from signal handlers. It should thus fix #85772 and #95826, as well as this issue (#126434). No exceptions are raised and no deadlocks should occur. There is a highly theoretical chance of a race condition that is described inside the code. I believe the chance is so low that it is safe to ignore. My guess is that an out of memory exception would occur long before the race condition could occur. The big change is that instead of waiting on a condition, it now busy waits on a The implementation sleeps for 10 milliseconds when busy waiting. On my machine, MacBook Air (15-inch, M3, 2024), CPU usage, when What do you think? Update: I've also updated the blurb. Thanks and kind regards. |
Hello, @ivarref! A while ago, I decided to be happy and stopped using wait() 😁 Looking at your strategy I noticed that it is similar to what I did: Before that, I tried using a wait on an event, which cost me hours of debugging and drained my patience. I know it's not elegant, but it works. |
I'll look closer at this later today. |
…set() call must be made to reach race condition possibility.
🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit bec9070 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Yeah, the failing buildbots for |
script = support.findfile("is_set_set.py", subdir="multiprocessingdata") | ||
for x in range(10): | ||
try: | ||
assert 0 == subprocess.call([sys.executable, script], timeout=60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use test.support.assert_python_ok
or assert_python_failure
rather than launching sys.executable
yourself. child interpreters may need flags passed, that takes care of it.
@support.requires_subprocess() | ||
class TestEventSignalHandling(unittest.TestCase): | ||
def test_no_race_for_is_set_set(self): | ||
import subprocess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this import up top, the module is always importable regardless of whether or not it actually works on the platform which requires_subprocess
tells you.
Lib/multiprocessing/synchronize.py
Outdated
elif timeout is not None and (time.monotonic() - start_time) > timeout: | ||
return False | ||
else: | ||
# Fixes https://github.com/python/cpython/issues/85772 by spinning and sleeping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iff (lets not assume we should go this route - see my comment on the issue) we're going to abandon use of the native OS platform APIs which properly implement timeouts on inter-process semaphores without a busy loop via _multiprocessing.SemLock
in Modules/_multiprocessing/semaphore.c
, sleeping in the loop should be done in an exponential back-off fashion as was the case even in threading
itself before we started using proper OS APIs there. See https://github.com/python/cpython/blob/v3.1.3/Lib/threading.py#L227 for the old exponential back-off delay example.
Doing a busy loop with sleeps as a low level primitive in 2024 feels very wrong to me. They've always been really unfriendly to both latency due to unnecessary delays and power usage from frequent unnecessary wakes.
I suggested an alternate idea in #126434.
History
Prior to CPython 3.2 threading.Condition
was implemented with a back-off in a similar manner. 3.2 improved on that old hack by using the OS APIs for lock timeouts in 7c3e577
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gpshead and @ZeroIntensity
I will read your comments and suggestions thoroughly tomorrow, and come back to you.
Thanks and kind regards.
I marked this as Draft just to make it clear that the direction to go has not been decided upon yet. |
If we do go with this, might it be better to put it under a new feature rather than |
I rewrote the approach to use a dedicated thread for signal handing. The initial signal handling is still done on the main thread, but merely pushes the signal to a dedicated thread. This is done using As asked in the issue comment above: is this a viable path going forward? Using only Python to implement this and leaving the current C code as is? Are there other considerations that must be done? |
…bubble exception on main thread
…ignal handler (on a non-main thread)?
The current approach is now:
Example flow: sequenceDiagram
participant OS
participant MainThread
participant SignalHandlerThread
OS->>MainThread: SIGINT
MainThread->>SignalHandlerThread: send SIGINT using SimpleQueue.put
SignalHandlerThread->>SignalHandlerThread: invoke user specified handler
SignalHandlerThread->>SignalHandlerThread: handler calls event.set()
Discussion: what should be done in the event of a signal handler calling Current fix: send an exit event to the main thread and send the same signal to interrupt the main thread again, i.e. the reverse flow of the diagram above: sequenceDiagram
participant OS
participant MainThread
participant SignalHandlerThread
SignalHandlerThread->>SignalHandlerThread: handler calls sys.exit(0)
SignalHandlerThread->>MainThread: send SYSEXIT event using SimpleQueue.put
SignalHandlerThread->>SignalHandlerThread: raise_signal(SIGINT)
OS->>MainThread: SIGINT
MainThread->>MainThread: checks SYSEXIT queue
MainThread->>MainThread: Calls sys.exit(0)
MainThread->>MainThread: `finally` block in will be executed.
This fix is the best I could think of. What do you think of this solution? Not all tests pass at the moment. CC @ZeroIntensity @gpshead Thanks for your time and patience. |
First of all, I'm impressed by your dedication! Unfortunately, and I hate to say this, but judging by the complexity of this change, and by the fact that we need whole diagrams to explain it, I think we need a PEP. There's a lot that we need to think about here, and we just can't get enough community input from a PR. |
Thanks for your kind words @ZeroIntensity Re 21 files: most of that is adding a line for the signal thread shutdown for a bunch of tests. A PEP: I am not sure what to say about that. In my opinion this is not a big/huge change, and the diagram was just to illustrate how it is working. I'm sure a ton of work has already been put into the signal module and only handling (as of now) on the main thread. Also a ton of work must have been done with respect to the different platforms (I read about how to do os.kill on windows for example, it was not easy). This makes me sceptical to change any of this. To me it seems that it is much easier to build this feature on top of the existing Python infrastructure, i.e. in a way that does not involve changing the underlying "mechanics" of C, OS-stuff, etc.. This makes it a moderate change IMO. And keeping the old behavior, it is possible to opt-out if needed. Perhaps it could break something in unexpected ways. I did not think (or know any details about) What do you think @gpshead and @colesbury , is this "worthy"/necessary of a PEP? If you all "vote" for a PEP, that is fine by me, and I could try to draft something. Thanks and kind regards. |
FWIW, we'll likely need changes to the C code anyway. Global state for modules that affect the whole process don't play well with subinterpreters, so I suspect that this will have to get implemented on the runtime structure. |
Thanks @ZeroIntensity
from https://docs.python.org/3/c-api/init.html#sub-interpreter-support I'm reading this as: the main interpreter will be the only interpreter that is responsible for signal handling. The subinterpreters should not need to access the signal module and nor the global state in that module. I suppose the signal functions can assert that they are being called on the main interpreter (or perhaps does so already in C-level?). That said: I know next to nothing about subinterpreters. Is there some other way that the global state can be corrupted? Or: how does global module state corrupt subinterpreters? (Is there existing global module state? I suppose there is?) Thanks and kind regards. |
Ah, by that it means that the main interpreter is responsible for dealing with signals sent to the Python process, so
I don't think so, but that's probably a bug with subinterpreters themselves. I'll look into that seperately. Anyways, the experimental, buggy subinterpreters aside, I'm not sure that the implementation even works, because CI is failing across the board. I'd recommend getting tests green before discussing the philosophical side of it. |
Please see issue #126434 for details about bug and code to reproduce deadlock.
Kind regards.