-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-61] Fix corner case with joining processes/queues #1473
Conversation
+1, LGTM Please elaborate on your commit why this is required. |
For the record @plypaul and I spoke on whether we want to keep the current joining logic in the future, which has then inconvenience on making the overall scheduler cycle as slow as the slowest DAG to process, or to change it in favor of an approach that would ship with whatever is ready to go on a predetermined schedule. We'd like to insure that the sane DAGs aren't held up by the insane ones. |
@mistercrunch good thought. The scheduler could in theory just let its worker processes stay alive forever, constantly parsing DAGs and passing tasks back to the main process for executing. That would be a relatively minor change I think. (and just to clarify for any future viewers of this thread -- this PR just corrects an edge case in the existing behavior) |
If a process places items in a queue and the process is joined before the queue is emptied, it can lead to a deadlock under some circumstances. Closes AIRFLOW-61. See for example: https://docs.python.org/3/library/multiprocessing.html#all-start-methods ("Joining processes that use queues") http://stackoverflow.com/questions/31665328/python-3-multiprocessing-queue-deadlock-when-calling-join-before-the-queue-is-em http://stackoverflow.com/questions/31708646/process-join-and-queue-dont-work-with-large-numbers http://stackoverflow.com/questions/19071529/python-multiprocessing-125-list-never-finishes
* master: AIRFLOW-92 Avoid unneeded upstream_failed session closes apache#1485 AIRFLOW-52 Warn about overwriting tasks in a DAG Add logic to lock DB and avoid race condition Handle queued tasks from multiple jobs/executors [AIRFLOW-80] Move example_twitter dag to contrib/example_dags as it requires hive [AIRFLOW-75] Fix bug in S3 config file parsing Use getfqdn to make sure urls are fully qualified [AIRFLOW-52] Fix bottlenecks when working with many tasks Add bulk_dump abstract method to DbApiHook (apache#1471) Fix corner case with joining processes/queues (apache#1473) [AIRFLOW-53] Adding DagBag stats report to CLI's list_dags (apache#1468)
Hey everyone. I'm trying to get airflow working in production, but for some reason the scheduler stops working after a few hours. I've added some more detailed logging and have tracked the issue back to the |
@schnie provide sample dag that exhibits this. Then it will become much easier to track this down. Next to that please create a Jira issue for it and provide as much info as you can |
Dear Airflow Maintainers,
Please accept this PR that addresses the following issues:
https://issues.apache.org/jira/browse/AIRFLOW-61