Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TimeoutError to be a retryable error in databricks provider #43128

Closed
2 tasks done
rawwar opened this issue Oct 17, 2024 · 0 comments · Fixed by #43137
Closed
2 tasks done

Add TimeoutError to be a retryable error in databricks provider #43128

rawwar opened this issue Oct 17, 2024 · 0 comments · Fixed by #43137

Comments

@rawwar
Copy link
Collaborator

rawwar commented Oct 17, 2024

Description

@lucafurrer reported that their Databricks jobs fail with asyncio.exceptions.TimeoutError. Below is their stack trace. I believe this should be considered a retryable error. However, I believe that letting users define their method which can confirm whether to retry or not will be a good and generic solution. I've created another issue for this #43127

[2024-10-15, 20:07:05 CEST] {baseoperator.py:1598} ERROR - Trigger failed:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/triggerer_job_runner.py", line 529, in cleanup_finished_triggers
    result = details["task"].result()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/triggerer_job_runner.py", line 601, in run_trigger
    async for event in trigger.run():
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/databricks/triggers/databricks.py", line 86, in run
    run_state = await self.hook.a_get_run_state(self.run_id)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/databricks/hooks/databricks.py", line 417, in a_get_run_state
    response = await self._a_do_api_call(GET_RUN_ENDPOINT, json)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/databricks/hooks/databricks_base.py", line 651, in _a_do_api_call
    async for attempt in self._a_get_retry_object():
  File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/_asyncio.py", line 71, in __anext__
    do = self.iter(retry_state=self._retry_state)
  File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/databricks/hooks/databricks_base.py", line 653, in _a_do_api_call
    async with request_func(
  File "/home/airflow/.local/lib/python3.8/site-packages/aiohttp/client.py", line 1194, in __aenter__
    self._resp = await self._coro
  File "/home/airflow/.local/lib/python3.8/site-packages/aiohttp/client.py", line 605, in _request
    await resp.start(conn)
  File "/home/airflow/.local/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 981, in start
    self._continue = None
  File "/home/airflow/.local/lib/python3.8/site-packages/aiohttp/helpers.py", line 735, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
[2024-10-15, 20:07:05 CEST] {taskinstance.py:2731} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 444, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 1599, in resume_execution
    raise TaskDeferralError(next_kwargs.get("error", "Unknown"))
airflow.exceptions.TaskDeferralError: Trigger failure

Use case/motivation

Original request from another related issue: #43080 (comment)

Related issues

#43080

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@rawwar rawwar added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Oct 17, 2024
@rawwar rawwar changed the title Add TimeoutError to be a retryable Add TimeoutError to be a retryable error in databricks provider Oct 17, 2024
@potiuk potiuk added good first issue and removed needs-triage label for new issues that we didn't triage yet labels Oct 17, 2024
@rawwar rawwar self-assigned this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants