Skip to content

Conversation

@Nataneljpwd
Copy link
Contributor

This PR fixes an issue we noticed, where we had Negative open slots for our executors in our metrics.

As it can be seen in here, the metric is calculated from the parallelism minus the length of self.running.
image

In the K8S Executor, it is updated in only a few places, where we create tasks in the methods self.adopt_launched_tasks and self._adopt_completed_tasks.
Where the first adds tasks to running when they are started, as they were just set to running, and the latter, adopts tasks which are completed, so that the K8S Watchers in AirflowKubernetesScheduler can delete the pods, this can cause an issue, where we can have negative amount of open slots for the K8S Executor, and it also means that we might drop tasks which we could set to running, just because completed tasks occupied their spot (as it is the check done by the SchedulerJobRunner, see Images bellow).

Here is where the SchedulerJobRunner uses the open_slots property of the executor.
image

image

This PR resolves the issue by addopting completed tasks to a different set, called completed, which resolves the negative open slots metric and the issue where in certain cases we run less tasks than we actually can.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Sep 17, 2025
@HsiuChuanHsu
Copy link
Contributor

Good find! 💪

@Nataneljpwd
Copy link
Contributor Author

@o-nikolas

Why not just change state here instead of creating a new set and looping over them elsewhere?

As then we are spreading logic of updating the tasks state across multiple methods, while now it is all done in the same place, after some thought, I decided that introducing a new variable will make the code easier to follow and less complex, even though there is a new variable, than if we start moving logic around, doing it inline would also require the method name to change, though it is a minor issue with a modern ide.

If you do not agree with it, I will make it there inline, even though I think it will introduce more complexity.

@Nataneljpwd Nataneljpwd force-pushed the feature/fix-kubernetes-executor-open-slots branch from 8913cdf to 88cec6a Compare September 29, 2025 17:26
@Nataneljpwd Nataneljpwd force-pushed the feature/fix-kubernetes-executor-open-slots branch from e5a5839 to 9540f90 Compare October 27, 2025 18:49
@eladkal eladkal merged commit 67e4de3 into apache:main Oct 28, 2025
93 checks passed
Lzzz666 pushed a commit to Lzzz666/airflow that referenced this pull request Oct 30, 2025
* fixed kubernetes executor open_slots issue

* fixed tests

* formatted file

---------

Co-authored-by: Natanel Rudyuklakir <natanelrudyuklakir@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants