-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-1945] Autoscale celery workers for airflow added #3989
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3989 +/- ##
==========================================
- Coverage 75.92% 72.87% -3.05%
==========================================
Files 199 199
Lines 15954 17003 +1049
==========================================
+ Hits 12113 12391 +278
- Misses 3841 4612 +771
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the contribution guidelines.
# based on number of queued tasks. pick these numbers based on resources on | ||
# worker box and the nature of the task. If autoscale option is available worker_concurrency | ||
# will be ignored | ||
worker_autoscale = 12,16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably shouldn't set a default value for this and worker_concurrency. Please comment this one out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A link to the section of the celery docs about this in the comment would help too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on not using default values but we don't have a provision to use shell functions here so that during execution we can get number of cores and all. What can be done in these scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashb this value has been commented out. Please review
@phani8996 whats harm in letting my machine run with the full capacity of workers all the time? Cause if I am allowing to grow it to a max, then it means that my machine has the capacity to handle that many workers anyway. |
We can run it at full capacity, but what advantage are we going to get with a bunch of idle workers? Instead this feature spawns workers as per demand. In a way you get what is required. No more under utilisation of workers. |
# "airflow worker" command. Pick these numbers based on resources on | ||
# worker box and the nature of the task. If autoscale option is available worker_concurrency | ||
# will be ignored | ||
#worker_autoscale = 12,16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko Link has been added. Please check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phani8996 a space after #
would be better. i.e # worker_autoscale = 12,16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@phani8996 plz rebase your commits into a single commit. Also, the commit message could be like this |
@msumit commits have been rebased and commit message updated with proper message. Please check. |
Mistakely closed PR |
fyi, seems this functionality has been removed in celery v4.x :( |
D'oh. Well that's annoyingly frustrating. I guess we should remove that in our next point release with a note in the updating to the lines of "We've removed this option as celery didn't respect it." Having an extra config directive in our files won't cause any problems/errors on our side so it can be in 1.10.16. @ddelange Fancy creating such a PR? (Against master please, and I will deal with back-porting it to the release) |
I actually really like the potential of using celery autoscale (e.g. put one massive worker on an autoscaling AWS EC2 instance to solve issue of scaling airflow temporarily for a sudden heavy load). right now it's not breaking anything having this option, celery only just ignores this option internally, and permanently puts it on the minimum concurrency specified. so from my side all good for now. just wanted to let you know (and anyone else who may stumble upon this PR via git blame) ^^ |
Sounds good. I'm okay leaving it so long as Celery still document it (even if it doesn't "work" right now) then |
I see that the autoscale code is still present in Celery master
https://github.com/celery/celery/blob/master/celery/worker/autoscale.py
…On Fri, Sep 27, 2019 at 3:54 PM Ash Berlin-Taylor ***@***.***> wrote:
Sounds good. I'm okay leaving it so long as Celery still document it (even
if it doesn't "work" right now) then
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#3989?email_source=notifications&email_token=AAPMYZ3SF6LOXADBWCBRF3DQLXNO7A5CNFSM4FYUP5JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7YPAXA#issuecomment-535883868>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAPMYZZTNMPFMAN5QTFUOVDQLXNO7ANCNFSM4FYUP5JA>
.
|
Dear Airflow Maintainers,
This will add a provision to autoscale celery workers unlike same numbers of workers irrespective of number of running tasks.
Please accept this PR that addresses the following issues:
https://issues.apache.org/jira/browse/AIRFLOW-1945
Testing Done:
Manually tested by passing arguments in cli