Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout when watching pod events in k8s executor #39551

Merged
merged 4 commits into from
May 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,7 @@ def run(self) -> None:
kube_client, self.resource_version, self.scheduler_job_id, self.kube_config
)
except ReadTimeoutError:
self.log.warning(
"There was a timeout error accessing the Kube API. Retrying request.", exc_info=True
)
self.log.info("Kubernetes watch timed out waiting for events. Restarting watch.")
time.sleep(1)
except Exception:
self.log.exception("Unknown error in KubernetesJobWatcher. Failing")
Expand Down Expand Up @@ -141,7 +139,7 @@ def _run(
) -> str | None:
self.log.info("Event: and now my watch begins starting at resource_version: %s", resource_version)

kwargs = {"label_selector": f"airflow-worker={scheduler_job_id}"}
kwargs: dict[str, Any] = {"label_selector": f"airflow-worker={scheduler_job_id}"}
if resource_version:
kwargs["resource_version"] = resource_version
if kube_config.kube_client_request_args:
Expand All @@ -150,6 +148,14 @@ def _run(

last_resource_version: str | None = None

# For info about k8s timeout settings see
# https://github.com/kubernetes-client/python/blob/v29.0.0/examples/watch/timeout-settings.md
# and https://github.com/kubernetes-client/python/blob/v29.0.0/kubernetes/client/api_client.py#L336-L339
client_timeout = 30
server_conn_timeout = 3600
kwargs["_request_timeout"] = client_timeout
kwargs["timeout_seconds"] = server_conn_timeout
Comment on lines +156 to +157
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These settings were configurable using the AIRFLOW__KUBERNETES_EXECUTOR__KUBE_CLIENT_REQUEST_ARGS env variable, but now they are overridden if I'm not mistaken. Is this expected?


for event in self._pod_events(kube_client=kube_client, query_kwargs=kwargs):
task = event["object"]
self.log.debug("Event: %s had an event of type %s", task.metadata.name, event["type"])
Expand Down