-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove custom signal handling in Triggerer #23274
Remove custom signal handling in Triggerer #23274
Conversation
There is a bug in CPython (fixed in March 2022 but not yet released) that makes async.io handle SIGTERM improperly by using async unsafe functions and hanging the triggerer receive SIGPIPE while handling SIGTERN/SIGINT and deadlocking itself. Until the bug is handled we should rather rely on standard handling of the signals rather than adding our own signal handlers. Seems that even if our signal handler just run exit(0) - it caused a race condition that led to the hanging. More details: * https://bugs.python.org/issue39622 * python/cpython#83803 Fixes: apache#19260
54aad2b
to
3a73320
Compare
CC: @andrewgodwin @ashb @dstandish -> I was able to reproduce the Ctrl-C problem and it's gone after I removed the custom signal handling in Triggerer, so it looks like the hypothesis of the async.io bug from https://bugs.python.org/issue39622 python/cpython#83803 seems even more plausible. Pls. take a look and see if my Hypothesis from #23271 (comment) looks sound and maybe we can just fix it permanently also for production. I believe our custom signal handling of SIGINT and SIGTERM in Triggerer (which then would simply run sys.exit(0) ) is not really needed (default handling of both signals terminates the process eventually). I left SIGQUIT handling though for diagnostics (And QUIT is rarely used anyway for anything else). |
I also found that the standalone problem with hanging was already reported in #19260 |
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
There is a bug in CPython (fixed in March 2022 but not yet released) that makes async.io handle SIGTERM improperly by using async unsafe functions and hanging the triggerer receive SIGPIPE while handling SIGTERN/SIGINT and deadlocking itself. Until the bug is handled we should rather rely on standard handling of the signals rather than adding our own signal handlers. Seems that even if our signal handler just run exit(0) - it caused a race condition that led to the hanging. More details: * https://bugs.python.org/issue39622 * python/cpython#83803 Fixes: #19260 (cherry picked from commit 6bdbed6)
There is a bug in CPython (fixed in March 2022 but not yet released) that makes async.io handle SIGTERM improperly by using async unsafe functions and hanging the triggerer receive SIGPIPE while handling SIGTERN/SIGINT and deadlocking itself. Until the bug is handled we should rather rely on standard handling of the signals rather than adding our own signal handlers. Seems that even if our signal handler just run exit(0) - it caused a race condition that led to the hanging. More details: * https://bugs.python.org/issue39622 * python/cpython#83803 Fixes: #19260 (cherry picked from commit 6bdbed6)
@potiuk It seems that after this we do not change the TriggererJob state to success any more. They stay in running: There is also register_signals method in the triggerer job, but it is never called. airflow/airflow/jobs/triggerer_job.py Lines 67 to 70 in 717a758
|
This reverts commit 6bdbed6.
Thanks @tanelk - reverting it for 2.3.3 then and reopened the original issue. |
There is a bug in CPython (fixed in March 2022 but not yet released) that
makes async.io handle SIGTERM improperly by using async unsafe
functions and hanging the triggerer receive SIGPIPE while handling
SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
we should rather rely on standard handling of the signals rather than
adding our own signal handlers. Seems that even if our signal handler
just run exit(0) - it caused a race condition that led to the hanging.
More details:
* https://bugs.python.org/issue39622
* python/cpython#83803
Fixes: #19260
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named
{pr_number}.significant.rst
, in newsfragments.