-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Feature: fix kubernetes executor open slots #55797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: fix kubernetes executor open slots #55797
Conversation
|
Good find! 💪 |
...iders/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Show resolved
Hide resolved
...iders/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Show resolved
Hide resolved
As then we are spreading logic of updating the tasks state across multiple methods, while now it is all done in the same place, after some thought, I decided that introducing a new variable will make the code easier to follow and less complex, even though there is a new variable, than if we start moving logic around, doing it inline would also require the method name to change, though it is a minor issue with a modern ide. If you do not agree with it, I will make it there inline, even though I think it will introduce more complexity. |
…/fix-kubernetes-executor-open-slots
8913cdf to
88cec6a
Compare
…/fix-kubernetes-executor-open-slots
e5a5839 to
9540f90
Compare
* fixed kubernetes executor open_slots issue * fixed tests * formatted file --------- Co-authored-by: Natanel Rudyuklakir <natanelrudyuklakir@gmail.com>
This PR fixes an issue we noticed, where we had Negative open slots for our executors in our metrics.
As it can be seen in here, the metric is calculated from the parallelism minus the length of

self.running.In the K8S Executor, it is updated in only a few places, where we create tasks in the methods
self.adopt_launched_tasksandself._adopt_completed_tasks.Where the first adds tasks to running when they are started, as they were just set to running, and the latter, adopts tasks which are completed, so that the K8S Watchers in
AirflowKubernetesSchedulercan delete the pods, this can cause an issue, where we can have negative amount of open slots for the K8S Executor, and it also means that we might drop tasks which we could set to running, just because completed tasks occupied their spot (as it is the check done by the SchedulerJobRunner, see Images bellow).Here is where the

SchedulerJobRunneruses theopen_slotsproperty of the executor.This PR resolves the issue by addopting completed tasks to a different set, called
completed, which resolves the negative open slots metric and the issue where in certain cases we run less tasks than we actually can.