-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Closed
Labels
area:providerskind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yetprovider:cncf-kubernetesKubernetes (k8s) provider related issuesKubernetes (k8s) provider related issues
Description
Apache Airflow Provider(s)
cncf-kubernetes
Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==10.5.0
Apache Airflow version
v2.11.0
Operating System
Amazon Linux 2
Deployment
Other 3rd-party Helm chart
Deployment details
EKS
What happened
K8s provider reports FailedScheduling as an ERROR level log when it is WARNING in K8s. This causes confusion for users as FailedScheduling events implies the task failed due to this error when K8s will happily attempt to retry scheduling until the pod TTL.
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - 0/14 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "test-part1-vyt3ovcx-bigstorage". preemption: 0/14 nodes are available: 14 Preemption is not helpful for scheduling.
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - Failed to schedule pod, incompatible with nodepool "high-
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - 0/14 nodes are available: 5 node(s) had untolerated taint {rfoo/component: bar}, 9 node(s) didn't match Pod's node affinity/selector. preemption: 0/14 nodes are available: 14 Preemption is not helpful for scheduling.
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - 0/14 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "test-part1-vyt3ovcx-bigstorage". preemption: 0/14 nodes are available: 14 Preemption is not helpful for scheduling.
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - Failed to schedule pod, incompatible with nodepool "high-availability", daemonset overhead={"cpu":"180m","memory":"120Mi","pods":"5"}, did not tolerate roman.ipac.caltech.edu/component=cm:NoSchedule; incompatible with nodepool "default", daemonset overhead={"cpu":"180m","memory":"120Mi","pods":"5"}, no instance type satisfied resources {"cpu":"8180m","memory":"65656Mi","pods":"6"} and requirements karpenter.k8s.aws/instance-category In [m], karpenter.k8s.aws/instance-generation In [6], karpenter.sh/capacity-type In [on-demand], karpenter.sh/nodepool In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux], topology.kubernetes.io/zone In [us-east-1a us-east-1b] (no instance type has enough resources); incompatible with nodepool "al2023", daemonset overhead={"cpu":"180m","memory":"120Mi","pods":"5"}, did not tolerate roman.ipac.caltech.edu/os=al2023:NoSchedule
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - 0/14 nodes are available: 5 node(s) had untolerated taint {rfoo/component: bar}, 9 node(s) didn't match Pod's node affinity/selector. preemption: 0/14 nodes are available: 14 Preemption is not helpful for scheduling.
What you think should happen instead
The log level should be WARNING
How to reproduce
Set up a k8s cluster with a node with some taint. Create a KPO tasks without the toleration for the taint and log_events_on_failure=True:
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
with DAG(...) as dag:
k = KubernetesPodOperator(
task_id="dry_run_demo",
image="debian",
cmds=["bash", "-cx"],
arguments=["echo", "10"],
log_events_on_failure=True
)Anything else
This was partially addressed in #36077 but did not address FailedScheduling Event type.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area:providerskind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yetprovider:cncf-kubernetesKubernetes (k8s) provider related issuesKubernetes (k8s) provider related issues