Add resume_glue_job_on_retry to GlueJobOperator#59392
Conversation
79f4214 to
26a4247
Compare
|
Looks good - but likely @vincbeck @o-nikolas @ferruzzi @ramitkataria should take a look |
providers/amazon/src/airflow/providers/amazon/aws/operators/glue.py
Outdated
Show resolved
Hide resolved
|
Good Day@henry3260 , Happy New Year and thank you so much for taking the initiative to add this feature. It will be helpful. I would like to clarify a specific scenario regarding a Glue job named Assuming 1 dag, there are 3 GlueJobOperator calling the same glue job name Assuming Thanks and let me know if you need further clarification |
providers/amazon/src/airflow/providers/amazon/aws/operators/glue.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/glue.py
Outdated
Show resolved
Hide resolved
|
@henry3260 Could you please address the open issues? |
Sorry for the late update. I'll address them shortly. |
26a4247 to
8e979ea
Compare
shahar1
left a comment
There was a problem hiding this comment.
LGTM! I'll merge if and when the CI is green.
While the CI is running, please try to avoid making additional changes so I could merge it right after it (hopefully) ends succesfully.
|
Hi! @wilsonhooi86 , Yes, it will find back the same previous_glue_job_id and stop creating a new glue job, because when GlueJobOperator retries, it will only find its own glue_job_run_id for the specific task_id.
Hi! @wilsonhooi86 , Yes, it will find back the same previous_glue_job_id and stop creating a new glue job, because when GlueJobOperator retries, it will only find its own glue_job_run_id for the specific task_id. |
Thanks <3 |
closes: #59075
Description
Add
resume_glue_job_on_retryparameter to GlueJobOperator to prevent duplicate AWS Glue job runs during task retries.Problem
When a GlueJobOperator task is retried after failure, the operator would always create a new AWS Glue job run, leading to:
Solution
Introduce
resume_glue_job_on_retryparameter that enables idempotent retry behavior:job_run_idinstead of creating a new oneChanges Made
GlueJobOperator (glue.py):
resume_glue_job_on_retry: bool = Falseparameter to__init__execute()method to check previous job state from XCom when enabledget_job_run()) to verify job state before deciding to create new runUnit Tests (test_glue.py):
test_check_previous_job_id_run_reuse_in_progress: Verifies previous job_run_id is reused when job is RUNNINGtest_check_previous_job_id_run_new_on_finished: Verifies new job is created when previous job is SUCCEEDEDBackward Compatibility
Fully backward compatible - parameter defaults to
False, maintaining existing behavior by default.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.