RedshiftCreateClusterOperator could leave clusters running after failure#61333
Merged
potiuk merged 1 commit intoapache:mainfrom Feb 15, 2026
Merged
Conversation
occur after successful creation (e.g. waiter failures due to missing DescribeClusters permissions). This change adds best-effort cleanup when post-create steps fail by attempting to delete the cluster that was successfully created. Cleanup errors are logged but do not mask the original exception. This mode is opt-in by default. Tests cover successful cleanup on waiter failure and ensure cleanup failures do not override the original error.
eladkal
reviewed
Feb 2, 2026
providers/amazon/src/airflow/providers/amazon/aws/operators/redshift_cluster.py
Show resolved
Hide resolved
o-nikolas
approved these changes
Feb 10, 2026
providers/amazon/src/airflow/providers/amazon/aws/operators/redshift_cluster.py
Show resolved
Hide resolved
Member
|
Nice! |
Closed
2 tasks
2 tasks
choo121600
pushed a commit
to choo121600/airflow
that referenced
this pull request
Feb 22, 2026
…res (apache#61333) occur after successful creation (e.g. waiter failures due to missing DescribeClusters permissions). This change adds best-effort cleanup when post-create steps fail by attempting to delete the cluster that was successfully created. Cleanup errors are logged but do not mask the original exception. This mode is opt-in by default. Tests cover successful cleanup on waiter failure and ensure cleanup failures do not override the original error. Co-authored-by: Sameer Mesiah <smesiah971@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Added best-effort cleanup for Redshift cluster creation to ensure clusters are deleted when failures occur after a cluster has been successfully created. Cleanup behavior is guarded by a flag and is opted in by default.
Previously, Redshift cluster creation could succeed via
create_cluster, but the operator could then fail during post-creation steps whenwait_for_completion=Trueand the IAM role lackingredshift:DescribeClusterspermissions. In these cases, the Airflow task failed while the Redshift cluster continued provisioning or remained active in AWS, resulting in leaked infrastructure.Cleanup has now been implemented for
RedshiftCreateClusterOperator. IfWaiterErroris raised after cluster creation has been initiated. the operator attempts a best-effort deletion of the cluster. Cleanup failures are logged but do not mask or replace the original exception.Rationale
Redshift cluster creation can succeed while post-creation steps fail. This commonly occurs with partially scoped IAM roles, for example, allowing
redshift:CreateClusterbut denyingredshift:DescribeClusters, which is required by the availability waiter.In these scenarios, the Airflow task fails while the cluster continues provisioning or running in AWS, leading to leaked infrastructure and ongoing cost. This change ensures that when a cluster has been started by the operator, failures during post-creation steps trigger a best-effort cleanup without altering error semantics or impacting unrelated resources.
It is also plausible for a cluster to reach an available state before cleanup is attempted. Cluster creation proceeds asynchronously in AWS and may complete independently of the waiter outcome or permission failures. In such cases, the cluster is immediately deletable, and attempting cleanup can successfully reclaim resources that would otherwise be left running.
Tests
WaiterErroroccurs during the wait phase after successful cluster creation.Documentation
The docstring for
RedshiftCreateClusterOperatorhas been updated to document the new flagdelete_cluster_on_failureand its default behavior.Backwards Compatibility
A new flag called
delete_cluster_on_failurehas been added toRedshiftCreateClusterOperatorwith a default value ofTrue. Best-effort cleanup will now be attempted if a post-creation failure (includingWaiterError) occurs after the cluster has been successfully created.Closes: #61324