Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-61] Fix corner case with joining processes/queues #1473

Merged
merged 1 commit into from
May 6, 2016

Conversation

jlowin
Copy link
Member

@jlowin jlowin commented May 6, 2016

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:
https://issues.apache.org/jira/browse/AIRFLOW-61

@bolkedebruin
Copy link
Contributor

+1, LGTM

Please elaborate on your commit why this is required.

@jlowin jlowin merged commit 415b363 into apache:master May 6, 2016
@jlowin jlowin deleted the queue-join branch May 6, 2016 16:17
@mistercrunch
Copy link
Member

For the record @plypaul and I spoke on whether we want to keep the current joining logic in the future, which has then inconvenience on making the overall scheduler cycle as slow as the slowest DAG to process, or to change it in favor of an approach that would ship with whatever is ready to go on a predetermined schedule.

We'd like to insure that the sane DAGs aren't held up by the insane ones.

@jlowin
Copy link
Member Author

jlowin commented May 6, 2016

@mistercrunch good thought. The scheduler could in theory just let its worker processes stay alive forever, constantly parsing DAGs and passing tasks back to the main process for executing. That would be a relatively minor change I think.

(and just to clarify for any future viewers of this thread -- this PR just corrects an edge case in the existing behavior)

@jlowin jlowin restored the queue-join branch May 6, 2016 17:02
@jlowin jlowin deleted the queue-join branch May 9, 2016 23:57
yiqingj pushed a commit to yiqingj/airflow that referenced this pull request May 27, 2016
* master:
  AIRFLOW-92 Avoid unneeded upstream_failed session closes apache#1485
  AIRFLOW-52 Warn about overwriting tasks in a DAG
  Add logic to lock DB and avoid race condition
  Handle queued tasks from multiple jobs/executors
  [AIRFLOW-80] Move example_twitter dag to contrib/example_dags as it requires hive
  [AIRFLOW-75] Fix bug in S3 config file parsing
  Use getfqdn to make sure urls are fully qualified
  [AIRFLOW-52] Fix bottlenecks when working with many tasks
  Add bulk_dump abstract method to DbApiHook (apache#1471)
  Fix corner case with joining processes/queues (apache#1473)
  [AIRFLOW-53] Adding DagBag stats report to CLI's list_dags (apache#1468)
@schnie
Copy link
Contributor

schnie commented Jul 11, 2016

Hey everyone. I'm trying to get airflow working in production, but for some reason the scheduler stops working after a few hours. I've added some more detailed logging and have tracked the issue back to the while any(j.is_alive() for j in jobs): loop introduced in this PR. For some reason, after running fine for hours, the main process will get stuck in this loop. I can run top on my airflow scheduler container and can see that the child process actually spawns, while the main process is looping endlessly using about 100% CPU. From what I can tell, the _do_dags target function never actually runs when this happens. Any ideas?

@bolkedebruin
Copy link
Contributor

@schnie provide sample dag that exhibits this. Then it will become much easier to track this down. Next to that please create a Jira issue for it and provide as much info as you can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants