-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiprocessing.Event.set() can deadlock when called from a signal handler or trace function #126434
Comments
Bug confirmed on main branch |
Hi @Zheaoli Thanks for the quick response. I've pushed a pull request now. What do you think? Thanks and kind regards. |
Update: I've added a blurb entry. |
I commented on the PR, but I don't think The general problem is that Python can execute signal handling code asynchronously at many places that don't expect to support reentrancy. RLock's don't solve this problem because the things they protect are often not safe to call reentrancy. In other words, calling them reentrantly breaks invariants that a (non-reentrant) lock would otherwise protect. |
…to avoid deadlock when there is reentrant usage of `set` from `is_set`, e.g. when handling system signals
…it()-ing. Raise an exception if that is the case. Fix race condition in multiprocessing.Event.wait() as described in python#95826
…e called from a signal handler in any combination.
…set() call must be made to reach race condition possibility.
If people want workarounds for existing code: #85772 (comment) is the a good starting point. As much as I do not like any recommendation of spawning a thread to do the Something I believe could be done to
I'd rather we weren't in the business of spawning threads from multiprocessing at all, but I believe this will work and will work transparently. It is a new feature, we'd not backport it as a bug fix. But it also lines up nicely with 3.14's existing change of the default multiprocessing start method away from threading-unsafe "fork". We could even adopt this conditionally based on the multiprocessing (we can ignore the |
I think the idea of calling I think we should do something similar with Python signal handling: instead of handling signals on the main thread, Python should handle signals on a dedicated thread, perhaps as a configurable option to Footnotes
|
Those thoughts are in part why I was being abstract about where things are handled. Conversations about the signal handler being done in another thread have been slowly recurring over the years. Good to know that other languages actually do this. A challenge if we adopt this approach is that people expect as an API that exceptions escaping a signal handlers are injected into the main thread. That seems solvable even if handler execution happens on another thread: Keep the existing pending signal check logic and turn that into a "pending exception from a signal handler to reraise" check. In this world, which thread signals are handled in and which thread sees exceptions escaping a signal handler interrupt it could both be independent concepts, potentially configurable (I'd prefer to avoid letting people do unusual things with such process global settings for consistency of expectations). |
My comment may not have been clear: I meant that other languages run finalizers in a dedicated thread, although
This would reintroduce the deadlocks and corruption issues that using a separate thread avoids. Python code isn't robust to exceptions being raised at arbitrary places. For example, it can corrupt the state of |
I don't think that would quite work. Imagine the following:
Thus, you would need to fire and forget (not wait for the result). That would potentially cause other problems. How about adding the following argument to
If Java has a method called addShutdownHook. The input parameter for that function is a not-started thread. The input parameter (thread) will be started and executed when the JVM is about to exit, for example upon encountering SIGINT, etc. Thus SIGINT will not be processed inside/on top of already running code. Java's Thread.sleep will throw an InterruptedException if the thread is, well, interrupted. This will happen on SIGINT as far as I know. Thus I suppose that Python's A dedicated signal thread that can only process one signal at a time seems like the right approach to me. But what is the possible breakage here, as @gpshead writes "expect as an API that exceptions escaping a signal handlers are injected into the main thread", ? How about this for the default behaviour in 3.14: Signal handling run on a dedicated thread. SIGINT, etc. will cause thread(s) to shutdown if no signal handler is installed. Is the current default behaviour, with no signal handlers installed, of Python to upon receiving SIGINT, to interrupt the main thread and let the other non-daemon threads finish as they please? Perhaps there is not that much to worry about: if the existing code does not do signal handling OR raises exceptions from the signal handler, which I suppose this approximately amounts to the same thing, is it fine to shutdown the program (or main thread only?) in such a case. The only difference is that the shutdown initiation comes from a different thread. I don't know much about Python's shutdown, signals and threads, so I'm uncertain how helpful my comments are, and how much work it would be to implement the suggestions described here. Thanks and kind regards even so :-) |
Hi again @colesbury and @gpshead I've done a new take on this. It effectively runs signal handlers on a dedicated thread*. The new signal.py code is highlighted here.
* It uses Here is an example program that uses the new signal handler: import os
import signal
import threading
import time
import multiprocessing
def sigint_self():
time.sleep(1)
print(f'{threading.current_thread().name}: Stopping PID {os.getpid()} with SIGINT')
os.kill(os.getpid(), signal.SIGINT)
def sigkill_self():
time.sleep(5)
print(f'{threading.current_thread().name}: Killing PID {os.getpid()} with SIGKILL')
os.kill(os.getpid(), signal.SIGKILL)
def run_signal_handler_dedicated_thread():
event = multiprocessing.Event()
def sigint_handler(_signo, _stack_frame):
try:
# x = 1 / 0
# ^^ If uncommented, the uncaught exception will be bubbled to the main thread.
print(f'{threading.current_thread().name}: sigint_handler is setting event')
event.set() # This would deadlock using the old signal handler code
finally:
print(f'{threading.current_thread().name}: sigint_handler is done')
def sigterm_handler(_signo, _stack_frame):
print(f'{threading.current_thread().name}: sigterm_handler is running')
pass
signal.signal(signal.SIGTERM, sigterm_handler)
signal.signal(signal.SIGINT, sigint_handler)
threading.Thread(target=sigint_self, daemon=True).start()
threading.Thread(target=sigkill_self, daemon=True).start() # Used for debugging only.
print(f'{threading.current_thread().name}: Waiting on event. PID = {os.getpid()}')
event.wait()
print(f'{threading.current_thread().name}: Waiting is done')
if __name__ == '__main__':
try:
run_signal_handler_dedicated_thread()
finally:
print(f'{threading.current_thread().name}: Exiting') The committed code could certainly be cleaned up. No tests currently exists. I'd be happy to add that. But first I'd like to ask: is this a viable path forwards? Using only Python to implement this and leaving the old C code as is**? Are there other considerations that must be done? ** I don't think I'm competent wrt. C and signals to write new C code. Thanks and kind regards. |
…bubble exception on main thread
…ignal handler (on a non-main thread)?
Bug report
Bug description:
multiprocessing.Event.set()
will acquire a lock when setting the internal flag.multiprocessing.Event.is_set()
will acquire the same lock when checking the flag. Thus if a signal handler calls.set()
when.is_set()
is running on the same process, there will be a deadlock.multiprocessing.Event
uses a regular non-reentrantlock lock. This should be changed to a reentrant lock. Please see the pull request.Thanks for all the work on the Python programming language. I appreciate all your efforts highly.
Kind regards.
Example program below that (sometimes) deadlocks. On my machine I typically need to run it less than 10 times before a deadlock occurs. Also included in the code block is a sample stacktrace.
CPython versions tested on:
3.11, 3.12, CPython main branch
Operating systems tested on:
Linux, macOS
Linked PRs
set
fromis_set
, e.g. when handling system signals #126437The text was updated successfully, but these errors were encountered: