-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Optimize K8s API usage for watching events #59080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize K8s API usage for watching events #59080
Conversation
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this - looks impressive improvement!
One mini nit and before merging would leave the PR open a few days for other 4 eyes to review. LGTM in my view.
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/utils/pod_manager.py
Outdated
Show resolved
Hide resolved
|
@jedcunningham Would be cool to have your opinion and having this in chart 1.19 release as well. |
…s/utils/pod_manager.py Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
|
As I heard from @AutomationDev85 about some problems with the asny event polling we might need tomorrow to triage to double-check this is not adding more problems than benefits. Please do not merge before clarified tomorrow (which might be 10.00 CET Tuesday, 9th) |
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and even a bit better like this. Would still wait (as no pressure to merge) for 1-2 days hoping for feedback from others prior merge. Some more eyes might be good.
|
This looks lie a fantastic improvement. |
Description
This PR optimizes how the
KubernetesPodOperatorinteracts with the Kubernetes API when retrieving events.Previously, the operator did not pass the
resourceVersionparameter when listing events for a pod. This forced Kubernetes to perform a quorum read for every request—an expensive operation. Combined with frequent polling for new events, this created significant load on the Kubernetes API and etcd, especially when many pods were started in parallel.Best practice is for clients to store the
resourceVersionfrom each response and provide it in subsequent requests. This allows Kubernetes to serve the event list far more efficiently. As stated in the Kubernetes documentation:Reference: https://kubernetes.io/docs/reference/using-api/api-concepts/#semantics-for-get-and-list
With this change, the operator performs one initial event listing without a
resourceVersion, and all subsequent requests include the last knownresourceVersion.Additionally, this PR introduces usage of the Kubernetes watch API in deferred (asynchronous) mode. Instead of polling every few seconds, the operator can now watch for new events. This provides two major benefits:
We implemented this change after observing a high number of HTTP
429(rate-limited) responses from our cluster’s API server. One contributing factor was the large volume ofGETrequests for event listings, which placed heavy load on etcd. After deploying a patched version of the operator with these improvements, the number of429responses dropped from several thousand per minute to nearly zero.Changes
resourceVersionwhen retrieving events from K8s APIwatchAPI to watch events when running in deferred (asynchronous) modewatchverb foreventsin pod launcher role in Helm chart (required so that Airflow triggerer has permission to watch events)^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.