Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3516] Support to create k8 worker pods in batches #4434

Merged
merged 1 commit into from
Jan 16, 2019
Merged

[AIRFLOW-3516] Support to create k8 worker pods in batches #4434

merged 1 commit into from
Jan 16, 2019

Conversation

ramandumcs
Copy link
Contributor

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Code Quality

  • Passes flake8

@codecov-io
Copy link

codecov-io commented Jan 4, 2019

Codecov Report

Merging #4434 into master will increase coverage by 3.82%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4434      +/-   ##
==========================================
+ Coverage   74.76%   78.58%   +3.82%     
==========================================
  Files         429      204     -225     
  Lines       29649    16482   -13167     
==========================================
- Hits        22167    12953    -9214     
+ Misses       7482     3529    -3953
Impacted Files Coverage Δ
airflow/operators/hive_stats_operator.py 0% <0%> (-100%) ⬇️
airflow/operators/presto_to_mysql.py 0% <0%> (-100%) ⬇️
airflow/www/api/experimental/endpoints.py 71.12% <0%> (-19.02%) ⬇️
airflow/executors/__init__.py 63.46% <0%> (-17.31%) ⬇️
airflow/api/common/experimental/delete_dag.py 84% <0%> (-4%) ⬇️
airflow/utils/sqlalchemy.py 78.57% <0%> (-3.25%) ⬇️
airflow/utils/db.py 32.28% <0%> (-2.2%) ⬇️
airflow/www_rbac/views.py 73.5% <0%> (-1.86%) ⬇️
airflow/plugins_manager.py 92.13% <0%> (-1.2%) ⬇️
airflow/www/views.py 69.82% <0%> (-0.36%) ⬇️
... and 245 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dba13d9...167e035. Read the comment docs.

README.md Outdated
@@ -109,6 +109,7 @@ Currently **officially** using Airflow:
1. [90 Seconds](https://90seconds.tv/) [[@aaronmak](https://github.com/aaronmak)]
1. [99](https://99taxis.com) [[@fbenevides](https://github.com/fbenevides), [@gustavoamigo](https://github.com/gustavoamigo) & [@mmmaia](https://github.com/mmmaia)]
1. [AdBOOST](https://www.adboost.sk) [[AdBOOST](https://github.com/AdBOOST)]
1. [Adobe](https://www.adobe.com/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to this PR.
Please open a new PR for this.

Also, once removing it please squash your commits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @OmerJog. Updated the PR

@feng-tao
Copy link
Member

PTAL @dimberman

@dimberman
Copy link
Contributor

@feng-tao apologies for the delay, getting over a cold. Will check out now.

@dimberman
Copy link
Contributor

@ramandumcs What is the use-case where a user would want to launch in batches rather than greedily launch tasks as they're received?

@ramandumcs
Copy link
Contributor Author

Thanks @dimberman for looking in to this PR.

As per current implementation K8 Executor submits/creates one k8 worker pod per scheduler loop.
Scheduler creates a single k8 worker pod inside self.executor.heartbeat() function of jobs.py file. In self.executor.heartbeat() function it calls self.sync() which submits/creates single k8 worker pod.
task_queue might have 100s of tasks to be run but only one task gets submitted per scheduler loop which increases the task scheduling latency/delay.

Each scheduling loop takes a minimum of 1 second so scheduling latency of last task in the task queue with 1000 tasks will be atleast 1000 seconds.
(We have also observed scheuling loop sometimes taking ~ 2 to 3 seconds which further increases the task latency)

We have a use case to run 1000s of concurrent tasks and we started using Airflow with K8 executor where we observed and investigated this behaviour.
So we are proposing a fix to have some control over the number of worker pods that get submitted per loop. Ideal scenarion might haven been to submit all the tasks per loop but it can impact the scheduler's processing of DAG files as these are synchronous call.
So we made this configurable.
Please let us know if it makes sense and please also share your thoughts/comments

@dimberman
Copy link
Contributor

@ramandumcs ohh ok that makes total sense. This LGTM. @kaxil is it too late to put this in 1.10.2?

@dimberman
Copy link
Contributor

@feng-tao good to go on my end!

@feng-tao
Copy link
Member

thanks @dimberman , @ramandumcs , couples of qqs for my understanding:

  1. do we need to add the config in default_test.cfg?
  2. is there a way to test the behavior?
  3. My understanding for the change is that previously we launch one pod per single heartbeat call, now we could launch multiple pods(not with multithreading) inside single heartbeat call?

@feng-tao feng-tao merged commit a7e369f into apache:master Jan 16, 2019
@ramandumcs
Copy link
Contributor Author

Yup, Now we are launching multiple pods per heartbeat call. I will see if we can add test cases for this particular behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants