-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
No response
Apache Airflow version
main
Operating System
ubuntu
Deployment
Astronomer
Deployment details
No response
What happened
When implementing the Ephemeral Dataproc Cluster pattern:
Create Cluster -> Run Jobs -> Delete Cluster (TriggerRule.ALL_DONE)
There is a conflict between the default behavior of DataprocCreateClusterOperator and the downstream DataprocDeleteClusterOperator.
DataprocCreateClusterOperatorhasdelete_on_error=Trueby default. If the cluster creation fails and ends up in anERRORstate, the operator automatically deletes the cluster.- The downstream
DataprocDeleteClusterOperatortriggers (due toTriggerRule.ALL_DONE). - It attempts to delete the cluster which no longer exists.
- The
DataprocDeleteClusterOperatorfails with aNotFound(404) error from the Google Cloud API.
This causes the cleanup task to be marked as failed, which creates noise and can potentially mask the actual upstream failure in monitoring views.
What you think should happen instead
DataprocDeleteClusterOperator should ideally be idempotent. If the cluster is already deleted (returns 404 NotFound), the operator should consider the task successful (or skipped) rather than failed.
Currently, the deferrable mode implementation checks for existence:
try:
hook.get_cluster(...)
except NotFound:
self.log.info("Cluster deleted.")
returnHowever, the standard synchronous execute path does not seem to catch NotFound exceptions during the delete operation.
How to reproduce
- Create a DAG with
DataprocCreateClusterOperator->DataprocDeleteClusterOperator(withtrigger_rule=TriggerRule.ALL_DONE). - Force the cluster creation to enter an ERROR state (e.g., by providing invalid configuration that passes validation but fails provisioning).
DataprocCreateClusterOperatorwill delete the cluster and fail.DataprocDeleteClusterOperatorwill run, attempt to delete the missing cluster, and fail withNotFound.
Anything else
Proposed behaviour:
- Update
DataprocDeleteClusterOperatorto catchNotFoundexceptions during the delete operation and log a message instead of raising an error. - Alternatively, update documentation to explicitly recommend setting
delete_on_error=FalseinDataprocCreateClusterOperatorwhen an explicit delete task is used.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct