-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow Provider(s)
cncf-kubernetes
Versions of Apache Airflow Providers
8.4.1
Apache Airflow version
Airflow 2
Operating System
linux
Deployment
Astronomer
Deployment details
No response
What happened
The KubernetesPodOperator does not correctly clean up pods when they are created with labels that have a None value. This results in orphaned pods remaining in the Kubernetes namespace after the task has finished/failed, even when on_finish_action is set to delete_pod.
What you think should happen instead
Root Cause Analysis:
The issue stems from an inconsistency between how labels are handled during pod creation versus pod deletion.
Pod Creation: When a pod is created, the _get_ti_pod_labels method iterates through the labels and uses str(label) to process the value. In Python, str(None) evaluates to the empty string "". Consequently, a pod is created with a valid Kubernetes label like my-label="".
Pod Deletion: When the task finishes, the cleanup method attempts to find the pod to delete it. It calls _build_find_pod_label_selector to construct a query for the Kubernetes API. This method, however, does not apply the same str() conversion. It uses the raw None object from the operator's self.labels dictionary.
This inconsistency leads to a malformed or incorrect label selector, causing the Kubernetes API to return no matching pods. Since the operator cannot find the pod it created, it cannot delete it, leaving the pod orphaned.
How to reproduce
Example DAG to Reproduce:
from __future__ import annotations
import pendulum
from airflow.models.dag import DAG
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
with DAG(
dag_id="kpo_none_label_bug_report",
start_date=pendulum.datetime(2025, 1, 1, tz="UTC"),
catchup=False,
schedule=None,
tags=["k8s", "bug"],
) as dag:
kpo_task = KubernetesPodOperator(
task_id="kpo_with_none_label",
namespace="default",
image="faulty:latest30",
cmds=["sh", "-c"],
arguments=["echo 'Starting...'; sleep 60; echo 'Done sleeping'"],
# This label with a `None` value triggers the bug
labels={"custom-label-with-none": None},
name="kpo-none-label-test",
on_finish_action="delete_pod",
# Ensure a new pod is created each time to reliably test deletion
reattach_on_restart=False,
config_file="/files/.kube/config.yml"
)
Expected Behavior: After the task kpo_with_none_label completes/fails with imagePullBack Off, the corresponding pod (kpo-none-label-test-*) should be deleted from the Kubernetes namespace.
Actual Behavior: The pod is not deleted and remains in the namespace with a imagePullBackOff/completes status.
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct