Skip to content

Conversation

@AutomationDev85
Copy link
Contributor

Overview

We use Airflow to scale different kind of tasks and some tasks are running for a long time (> 5 hours) and we are using the KubernetesPodOperator to start a Pod and track the log output of the task. We also use Keda autoscaling to scale up and down the cluster nodes. The issue is that during scale up and down the consume log http connection is disturbed as the API backend also scales up and down. We see then an exception in the log file which confuses the user, as he thinks it has something to do with his task.

As the pod manager already tries to reconnect automatically the idea is to add the HTTPError only it it occurred more than 2 times in 60 seconds to not confuse a user log with exceptions which are ok an handled.

Details of change:

  • Track the amount of HTTP errors.
  • Only add exception text to the log file, if more than 2 HTTPError occurred in the last 60 Seconds.
  • With that the normal reconnect is not visible in the log file.
  • Warning about possible duplication of the log file lines due to short log file read interruption is still added to the log file.

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me. Whereas the readability of the threshold is a bit hard. I am thinking a bit about if it would be possible to put the check for threshold in a small utility that can be shared for more similar cases? But not blocking...

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for me as well.

@potiuk potiuk merged commit 1cb057f into apache:main Aug 22, 2025
86 checks passed
mangal-vairalkar pushed a commit to mangal-vairalkar/airflow that referenced this pull request Aug 30, 2025
Co-authored-by: AutomationDev85 <AutomationDev85>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants