Issue with task_queued_timeout Causing Silent Task Failures in Airflow 2.10.5

### Apache Airflow version

Other Airflow 2 version (please specify below)

### If "Other Airflow 2 version" selected, which one?

2.10.5

### What happened?

After upgrading to Airflow 2.10.5, we received the following deprecation warning:

`/home/airflow/.local/lib/python3.10/site-packages/airflow/cli/commands/scheduler_command.py:41 
DeprecationWarning: The '[kubernetes_executor] worker_pods_pending_timeout' config option is deprecated. 
Please update your config to use '[scheduler] task_queued_timeout' instead.`

- We updated our configuration by setting task_queued_timeout = 300 under the [scheduler] section. A few days later, we noticed that several tasks in our production DAGs were failing after remaining in the queued state for 5 minutes. However, no failure alerts were triggered, even though the DAGs are configured to retry 3 times and send alerts via Slack.

- We assumed that since the tasks never started execution, the alerting mechanism might not have been triggered. However, our understanding is that task_queued_timeout should mark the task as failed and trigger alerts.

- To mitigate the issue, we increased the timeout to 900 seconds (15 minutes). This worked for about three weeks, but we are now seeing the same issue again—tasks are failing after being queued for over 15 minutes, without any alerts. This is happening despite having 100-node capacity available in our Databricks instance pool.

Either the task_queued_timeout is not integrated with the alerting mechanism or the tasks are not being scheduled properly despite available resources, possibly due to a scheduling or executor issue.


### What you think should happen instead?

- Tasks that fail due to task_queued_timeout should trigger failure alerts, just like any other task failure.
- Tasks should not remain in the queued state for extended periods if sufficient executor capacity is available.

### How to reproduce

- Upgrade to Airflow 2.10.5.
- Set task_queued_timeout = 300 in airflow.cfg under [scheduler].
- Deploy a DAG with tasks that may experience scheduling delays.
- Observe tasks failing after 5 minutes in the queued state without triggering alerts.
- Increase timeout to 900 seconds and observe recurrence under similar conditions.

### Operating System

linux/amd64

### Versions of Apache Airflow Providers

_No response_

### Deployment

Official Apache Airflow Helm Chart

### Deployment details

![Image](https://github.com/user-attachments/assets/b3720ba6-0f16-4885-a08e-d36a15a645ce)

We maintain our helm charts in a git repo and deploy them in Kubernetes using ArgoCD. All our tasks use Databricks compute to run the tasks.

### Anything else?

- Does not appear to be a callback code error, as it works for other tasks/DAGs.
- Logs for the failed task show:
`*** Could not read served logs: Invalid URL 'http://:8793/log/dag_id=...': No host supplied`

### Are you willing to submit PR?

- [ ] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with task_queued_timeout Causing Silent Task Failures in Airflow 2.10.5 #51301

Apache Airflow version

If "Other Airflow 2 version" selected, which one?

What happened?

What you think should happen instead?

How to reproduce

Operating System

Versions of Apache Airflow Providers

Deployment

Deployment details

Anything else?

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with task_queued_timeout Causing Silent Task Failures in Airflow 2.10.5 #51301

Description

Apache Airflow version

If "Other Airflow 2 version" selected, which one?

What happened?

What you think should happen instead?

How to reproduce

Operating System

Versions of Apache Airflow Providers

Deployment

Deployment details

Anything else?

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions