-
Notifications
You must be signed in to change notification settings - Fork 16.3k
[v3-1-test] Use exc_info for task instance heartbeat failure exception logging (#57172)
#57179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ion logging (#57172) closes: #57167 PR #52562 changed `_handle_heartbeat_failures()` to accept an exception parameter and log it as a structured field. Now due to this change, during the first failed heartbeat attempt, the _handle_heartbeat_failures function logs a message by calling log.warning(), which accepts an exception parameter that expects a string type object. However, in the source code, [an exception type object is passed](https://github.com/apache/airflow/blob/54bd5d8cd9f6f477cc83445737614dec81c4323c/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L1126) instead of a string type object. This results in a TypeError (like below) which causes task supervision to fail. The error looked like this: ```python 2025-10-23T17:58:22.900129Z [error ] Task execute_workload[aac34f36-54e1-46e4-ba47-15dba8ba7149] raised unexpected: TypeError('can only concatenate str (not "ConnectError") to str') [celery.app.trace] loc=trace.py:267 Traceback (most recent call last): File "/usr/python/lib/python3.10/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions yield File "/usr/python/lib/python3.10/site-packages/httpx/_transports/default.py", line 250, in handle_request resp = self._pool.handle_request(req) File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request raise exc from None File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request response = connection.handle_request( File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 101, in handle_request raise exc File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 78, in handle_request stream = self._connect(request) File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 124, in _connect stream = self._network_backend.connect_tcp(**kwargs) File "/usr/python/lib/python3.10/site-packages/httpcore/_backends/sync.py", line 207, in connect_tcp with map_exceptions(exc_map): File "/usr/python/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/python/lib/python3.10/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions raise to_exc(exc) from exc httpcore.ConnectError: [Errno 111] Connection refused ``` The change in #52562 was mainly made due to ruff upgrade reasons, so I am going back to using the standard Python logging pattern: pass exception to `exc_info` parameter. After changes, error looks like this: ```python 2025-10-23T18:24:39.303939Z [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 1st time calling it. [airflow.sdk.api.client] loc=before.py:42 2025-10-23T18:24:40.307404Z [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. [airflow.sdk.api.client] loc=before.py:42 2025-10-23T18:24:41.417523Z [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. [airflow.sdk.api.client] loc=before.py:42 2025-10-23T18:24:43.801145Z [warning ] Starting call to 'airflow.sdk.api.client.Client.request', this is the 4th time calling it. [airflow.sdk.api.client] loc=before.py:42 2025-10-23T18:24:46.690339Z [warning ] Failed to send heartbeat. Will be retried [supervisor] failed_heartbeats=3 loc=supervisor.py:1135 max_retries=3 ti_id=UUID('019a1250-48a0-756c-ab0b-9687289ef580') Traceback (most recent call last): File "/usr/python/lib/python3.10/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions yield File "/usr/python/lib/python3.10/site-packages/httpx/_transports/default.py", line 250, in handle_request resp = self._pool.handle_request(req) File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 256, in handle_request raise exc from None File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 236, in handle_request response = connection.handle_request( File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 101, in handle_request raise exc File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 78, in handle_request stream = self._connect(request) File "/usr/python/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 124, in _connect stream = self._network_backend.connect_tcp(**kwargs) File "/usr/python/lib/python3.10/site-packages/httpcore/_backends/sync.py", line 207, in connect_tcp with map_exceptions(exc_map): File "/usr/python/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/python/lib/python3.10/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions raise to_exc(exc) from exc httpcore.ConnectError: [Errno 111] Connection refused The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/airflow/task-sdk/src/airflow/sdk/execution_time/supervisor.py", line 1105, in _send_heartbeat_if_needed self.client.task_instances.heartbeat(self.id, pid=self._process.pid) File "/opt/airflow/task-sdk/src/airflow/sdk/api/client.py", line 259, in heartbeat self.client.put(f"task-instances/{id}/heartbeat", content=body.model_dump_json()) File "/usr/python/lib/python3.10/site-packages/httpx/_client.py", line 1181, in put return self.request( File "/usr/python/lib/python3.10/site-packages/tenacity/__init__.py", line 338, in wrapped_f return copy(f, *args, **kw) File "/usr/python/lib/python3.10/site-packages/tenacity/__init__.py", line 477, in __call__ do = self.iter(retry_state=retry_state) File "/usr/python/lib/python3.10/site-packages/tenacity/__init__.py", line 378, in iter result = action(retry_state) File "/usr/python/lib/python3.10/site-packages/tenacity/__init__.py", line 420, in exc_check raise retry_exc.reraise() File "/usr/python/lib/python3.10/site-packages/tenacity/__init__.py", line 187, in reraise raise self.last_attempt.result() File "/usr/python/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/usr/python/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception File "/usr/python/lib/python3.10/site-packages/tenacity/__init__.py", line 480, in __call__ result = fn(*args, **kwargs) File "/opt/airflow/task-sdk/src/airflow/sdk/api/client.py", line 894, in request return super().request(*args, **kwargs) File "/usr/python/lib/python3.10/site-packages/httpx/_client.py", line 825, in request return self.send(request, auth=auth, follow_redirects=follow_redirects) File "/usr/python/lib/python3.10/site-packages/httpx/_client.py", line 914, in send response = self._send_handling_auth( File "/usr/python/lib/python3.10/site-packages/httpx/_client.py", line 942, in _send_handling_auth response = self._send_handling_redirects( File "/usr/python/lib/python3.10/site-packages/httpx/_client.py", line 979, in _send_handling_redirects response = self._send_single_request(request) File "/usr/python/lib/python3.10/site-packages/httpx/_client.py", line 1014, in _send_single_request response = transport.handle_request(request) File "/usr/python/lib/python3.10/site-packages/httpx/_transports/default.py", line 249, in handle_request with map_httpcore_exceptions(): File "/usr/python/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback) File "/usr/python/lib/python3.10/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions raise mapped_exc(message) from exc httpx.ConnectError: [Errno 111] Connection refused ``` (cherry picked from commit 970d7da) Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
amoghrajesh
approved these changes
Oct 24, 2025
This was referenced Oct 31, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes: #57167
PR #52562 changed
_handle_heartbeat_failures()to accept an exception parameter and log it as a structured field. Now due to this change, during the first failed heartbeat attempt, the _handle_heartbeat_failures function logs a message by calling log.warning(), which accepts an exception parameter that expects a string type object. However, in the source code, an exception type object is passed instead of a string type object. This results in a TypeError (like below) which causes task supervision to fail.The error looked like this:
The change in #52562 was mainly made due to ruff upgrade reasons, so I am going back to using the standard Python logging pattern: pass exception to
exc_infoparameter.After changes, error looks like this:
(cherry picked from commit 970d7da)
Co-authored-by: Amogh Desai amoghrajesh1999@gmail.com