Skip to content

EksPodOperator with in_cluster=False does not work when running in deferrable mode #46964

@rawwar

Description

@rawwar

Apache Airflow version

main (development)

If "Other Airflow 2 version" selected, which one?

No response

What happened?

EksPodOperator will fail to complete when in_cluster=False and deferrable=True. I've verified the same code to succeed in deferrable=False. I've included the dag code in the issue.

What you think should happen instead?

No response

How to reproduce

Dag:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.providers.amazon.aws.operators.eks import EksPodOperator

# Define default_args
default_args = {
    "owner": "airflow",
    "depends_on_past": False,
    "start_date": datetime(2024, 2, 13),
    "retries": 0,
    "retry_delay": timedelta(minutes=5),
}

# Define DAG
with DAG(
    "kpo_hello_world_deferrable",
    default_args=default_args,
    schedule_interval=None,  # Manual trigger
    catchup=False,
) as dag:

    hello_pod = EksPodOperator(
        task_id="hello_world_pod",
        aws_conn_id="aws_default",
        in_cluster=False,
        cluster_name="test-ekspodop",
        region="us-east-2",
        pod_name="hello-world-pod",
        image="python:3.8",
        cmds=["python", "-c", 'import time;time.sleep(150)'],
        get_logs=True,
        deferrable=True,
        logging_interval=10
    )

    hello_pod

Above task fails with below stack trace:

[2025-02-19, 06:56:21 UTC] {taskinstance.py:3313} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 818, in trigger_reentry
    self.pod = self.hook.get_pod(pod_name, pod_namespace)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/hooks/kubernetes.py", line 455, in get_pod
    return self.core_v1_client.read_namespaced_pod(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py", line 23693, in read_namespaced_pod
    return self.read_namespaced_pod_with_http_info(name, namespace, **kwargs)  # noqa: E501
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kubernetes/client/api/core_v1_api.py", line 23780, in read_namespaced_pod_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kubernetes/client/rest.py", line 244, in GET
    return self.request("GET", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/kubernetes/client/rest.py", line 238, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '473bc5ed-3e63-41d5-9184-082f2a36705f', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '393e92c5-5adb-4741-b8c5-b423919d51f2', 'X-Kubernetes-Pf-Prioritylevel-Uid': '5c23ef0b-6725-43f5-9505-5ad426bd5e7f', 'Date': 'Wed, 19 Feb 2025 06:56:18 GMT', 'Content-Length': '390'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"hello-world-pod-451nntkb\" is forbidden: User \"system:serviceaccount:planetic-space-8435:planetic-space-8435-worker-serviceaccount\" cannot get resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"name":"hello-world-pod-451nntkb","kind":"pods"},"code":403}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 768, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 734, in _execute_callable
    return ExecutionCallableRunner(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/airflow/utils/operator_helpers.py", line 252, in run
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1816, in resume_execution
    return execute_callable(context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 875, in trigger_reentry
    self._clean(event, context)
  File "/usr/local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 885, in _clean
    self.pod = self.pod_manager.await_pod_completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 649, in await_pod_completion
    remote_pod = self.read_pod(pod)
                 ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 418, in exc_check
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 185, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 764, in read_pod
    return self._client.read_namespaced_pod(pod.metadata.name, pod.metadata.namespace)
                                            ^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'metadata'

Operating System

Linux

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==10.1.0
apache-airflow-providers-amazon==9.2.0

Deployment

Astronomer

Deployment details

No response

Anything else?

Full logs of the task run

dag_id=kpo_hello_world_deferrable_run_id=manual__2025-02-19T06_49_27.777282+00_00_task_id=hello_world_pod_attempt=2.log

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions