Skip to content

Log level for KubernetesPodOperator's Pod Event FailedSchedulingshould be warning not error #54964

@ketozhang

Description

@ketozhang

Apache Airflow Provider(s)

cncf-kubernetes

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==10.5.0

Apache Airflow version

v2.11.0

Operating System

Amazon Linux 2

Deployment

Other 3rd-party Helm chart

Deployment details

EKS

What happened

K8s provider reports FailedScheduling as an ERROR level log when it is WARNING in K8s. This causes confusion for users as FailedScheduling events implies the task failed due to this error when K8s will happily attempt to retry scheduling until the pod TTL.

[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - 0/14 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "test-part1-vyt3ovcx-bigstorage". preemption: 0/14 nodes are available: 14 Preemption is not helpful for scheduling.
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - Failed to schedule pod, incompatible with nodepool "high-
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - 0/14 nodes are available: 5 node(s) had untolerated taint {rfoo/component: bar}, 9 node(s) didn't match Pod's node affinity/selector. preemption: 0/14 nodes are available: 14 Preemption is not helpful for scheduling.
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - 0/14 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "test-part1-vyt3ovcx-bigstorage". preemption: 0/14 nodes are available: 14 Preemption is not helpful for scheduling.
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - Failed to schedule pod, incompatible with nodepool "high-availability", daemonset overhead={"cpu":"180m","memory":"120Mi","pods":"5"}, did not tolerate roman.ipac.caltech.edu/component=cm:NoSchedule; incompatible with nodepool "default", daemonset overhead={"cpu":"180m","memory":"120Mi","pods":"5"}, no instance type satisfied resources {"cpu":"8180m","memory":"65656Mi","pods":"6"} and requirements karpenter.k8s.aws/instance-category In [m], karpenter.k8s.aws/instance-generation In [6], karpenter.sh/capacity-type In [on-demand], karpenter.sh/nodepool In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux], topology.kubernetes.io/zone In [us-east-1a us-east-1b] (no instance type has enough resources); incompatible with nodepool "al2023", daemonset overhead={"cpu":"180m","memory":"120Mi","pods":"5"}, did not tolerate roman.ipac.caltech.edu/os=al2023:NoSchedule
[2025-08-26, 17:14:45 PDT] {pod.py:1027} ERROR - Pod Event: FailedScheduling - 0/14 nodes are available: 5 node(s) had untolerated taint {rfoo/component: bar}, 9 node(s) didn't match Pod's node affinity/selector. preemption: 0/14 nodes are available: 14 Preemption is not helpful for scheduling.

What you think should happen instead

The log level should be WARNING

How to reproduce

Set up a k8s cluster with a node with some taint. Create a KPO tasks without the toleration for the taint and log_events_on_failure=True:

from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator

with DAG(...) as dag:
    k = KubernetesPodOperator(
        task_id="dry_run_demo",
        image="debian",
        cmds=["bash", "-cx"],
        arguments=["echo", "10"],
        log_events_on_failure=True
    )

Anything else

This was partially addressed in #36077 but did not address FailedScheduling Event type.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions