-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version
3.1.0
If "Other Airflow 2/3 version" selected, which one?
No response
What happened?
Supervision of a task instance failed after a single failed heartbeat attempt even though the max_failed_heartbeats is set to 3. This happened because an exception was raised when the _handle_heartbeat_failures function was called.
During the first failed heartbeat attempt, the _handle_heartbeat_failures function logs a message by calling log.warning(), which accepts an exception parameter that expects a string type object. However, in the source code, an exception type object is passed instead of a string type object. This results in a TypeError (like below) which causes task supervision to fail.
TypeError: can only concatenate str (not "RemoteProtocolError") to str
I have attached the stack trace from the worker logs.
What you think should happen instead?
I believe a string should be passed (like below) instead of an exception object here.
exception=str(exc)
Alternatively, I think we could pass exc_info=True instead of the exception parameter. This is what was done until 3.0.4. I am not sure if there was a specific reason for changing this.
How to reproduce
Run a task instance and kill the API server once the task instance begins running.
You will notice that the task supervision fails upon the first failed heartbeat.
Operating System
Debian GNU/Linux
Versions of Apache Airflow Providers
No response
Deployment
Astronomer
Deployment details
No response
Anything else?
This also affects Airflow 3.0.5 and 3.0.6. Prior to that, exc_info=True was used.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct