-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry logic for KubernetesCreateResourceOperator and KubernetesJobOperator #39201
Add retry logic for KubernetesCreateResourceOperator and KubernetesJobOperator #39201
Conversation
why don't you use the internal retry parameter of airflow ? |
I use the same approach what we use for retry Pod creation: https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L347C1-L356C1 |
I was thinking about the BaseOperator argument PythonOperator(
task_id="aa",
retries=3,
python_callable=toto,
) |
related : #15137 |
This is an option for users. If a user wants to retry a specific task, they can use this parameter. Here, if I understand correctly, @MaksYermak wants to retry without the user being aware or needing to do something. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests to cover these retries?
f265b2a
to
798a60d
Compare
Sure, I have added a unit tests. |
798a60d
to
1941e94
Compare
1941e94
to
8e07ee0
Compare
Hi @raphaelauv @dirrao @vincbeck ! |
In this PR I have added retry logic for KubernetesCreateResourceOperator and KubernetesJobOperator.
This logic is needed for preventing 'No agent available' error. The error appears time to time when users try to create a Resource or Job. This issue is inside Kubernetes and in the current moment has no solution. Like a temporary solution we decided to retry Job or Resource creation request each time when this error appears.
Link for the same issue for cert-manager service: cert-manager/cert-manager#6457
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.