-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful termination does not work with apache chart #13591
Comments
What does work:I verified, at least with 2.0 that warm shutdown works with the following change for worker deployment: command: ["airflow"]
args: ["celery", "worker"] More things that don't workBefore arriving at the above, I tried this (didn't work): command: ["/usr/bin/dumb-init", "--", "airflow"]
args: ["celery", "worker"] This also didn't work: command: ["/usr/bin/dumb-init"]
args: ["airflow", "celery", "worker"] Produced more of these errors:
|
@potiuk i think you are main architect of dockerfile. do you know whats going on here? i don't really understand this area very well... dumb-init / tini / gosu, and what happens when combined with entrypoints and args... though i'd like to! |
astronomer's helm chart uses gosu and tini, it seems. fwiw, in my previous company we used astronomer EE and the termination did work |
This issue from celery might be relevant: celery/billiard#273 Some people saying it's Linux distro related: celery/billiard#273 (comment) .. |
I will take a look later this week. It also depends which command is used to run airflow components. You are.talking about the current master version of the 'chart' yeah ? No modification to the entrypoint or command ? The dumb init and tini are equivalent and they are indeed there to forward signals to the running processes (this is really useful when you have a bash script as entrypoint (if you have bash as direct entrypoint then it will not forward signals to it's children. There are two solutions to solve it: A) dumb init or tini as entrypoint Default entrypoint in prod image is dumb-init so it should propagate the signals properly, but as @xinbinhuang mentioned when you have celery worker it has a number of config options when you send a SIGTERM to it celery worker it will stop spawning new processes and wait for all the running tasks to terminate. So by definition the worker might take quite some time to exit. There is the termination grace period that controls how long it will take for the celery to wait for all processes to terminate before it will 'kill -9' and exits 'non gracefully'. Also there is another gotcha - if you send SECOND SIGTERM to such celery worker while it is waiting for tasks, it will terminate all the processes with 'kill -9' and will exit immediately. So if you expect the worker to terminate immeditaely you might have observed actually wrong behaviour where someone sent more than one SIGTERM to those workers (I've seen such setups) - but this is a rather bad idea IMHO. |
correct, no mods to entrypoint. You can see which things i tried in helm config above -- diff values of args or command.
No, I do not want worker to terminate immediately. I want it to do what it is supposed to, namely warm shutdown -- i.e. stop taking tasks, and run until either all tasks done or grace period has elapsed |
and to clarify @potiuk yes it is latest master |
this sounds interesting celery/billiard#273 (comment) |
also this one does: celery/billiard#273 (comment) |
i think we can consider this resolved by #16153 graceful termination still does not work out of the box with released 1.0.0 chart but with that PR you can use the command / args combination that works, namely this:
|
In apache/airflow, helm chart has worker default
terminationGracePeriodSeconds: 600
.I observed after deploy using 1.10.14 that worker was terminated immediately. This reproduced consistently.
Tested also with 2.0.0 and again no lucke
Anyone have any hints of something to look into?
Here are some logs from a worker that shutdown ungracefully, running 1.10.14:
And again with 2.0.0:
With 2.0.0 theres no error, but still it is immediate termination with no respecting of grace period.
I tried various combinations of
args
and saw the same behavior every time:["bash", "-c", "airflow worker"]
["bash", "-c", "exec airflow worker"]
["worker"]
The text was updated successfully, but these errors were encountered: