RedshiftCreateClusterOperator could leave clusters running after failure by SameerMesiah97 · Pull Request #61333 · apache/airflow

SameerMesiah97 · 2026-02-01T18:51:01Z

Description

Added best-effort cleanup for Redshift cluster creation to ensure clusters are deleted when failures occur after a cluster has been successfully created. Cleanup behavior is guarded by a flag and is opted in by default.

Previously, Redshift cluster creation could succeed via create_cluster, but the operator could then fail during post-creation steps when wait_for_completion=True and the IAM role lacking redshift:DescribeClusters permissions. In these cases, the Airflow task failed while the Redshift cluster continued provisioning or remained active in AWS, resulting in leaked infrastructure.

Cleanup has now been implemented for RedshiftCreateClusterOperator. If WaiterError is raised after cluster creation has been initiated. the operator attempts a best-effort deletion of the cluster. Cleanup failures are logged but do not mask or replace the original exception.

Rationale

Redshift cluster creation can succeed while post-creation steps fail. This commonly occurs with partially scoped IAM roles, for example, allowing redshift:CreateCluster but denying redshift:DescribeClusters, which is required by the availability waiter.

In these scenarios, the Airflow task fails while the cluster continues provisioning or running in AWS, leading to leaked infrastructure and ongoing cost. This change ensures that when a cluster has been started by the operator, failures during post-creation steps trigger a best-effort cleanup without altering error semantics or impacting unrelated resources.

It is also plausible for a cluster to reach an available state before cleanup is attempted. Cluster creation proceeds asynchronously in AWS and may complete independently of the waiter outcome or permission failures. In such cases, the cluster is immediately deletable, and attempting cleanup can successfully reclaim resources that would otherwise be left running.

Tests

Added a unit test verifying that cluster deletion is attempted when a WaiterError occurs during the wait phase after successful cluster creation.
Added a unit test ensuring that failures during cleanup do not mask or override the original exception raised by the waiter.

Documentation

The docstring for RedshiftCreateClusterOperator has been updated to document the new flag delete_cluster_on_failure and its default behavior.

Backwards Compatibility

A new flag called delete_cluster_on_failure has been added to RedshiftCreateClusterOperator with a default value of True. Best-effort cleanup will now be attempted if a post-creation failure (including WaiterError) occurs after the cluster has been successfully created.

Closes: #61324

occur after successful creation (e.g. waiter failures due to missing DescribeClusters permissions). This change adds best-effort cleanup when post-create steps fail by attempting to delete the cluster that was successfully created. Cleanup errors are logged but do not mask the original exception. This mode is opt-in by default. Tests cover successful cleanup on waiter failure and ensure cleanup failures do not override the original error.

providers/amazon/src/airflow/providers/amazon/aws/operators/redshift_cluster.py

potiuk · 2026-02-15T18:47:46Z

Nice!

…res (apache#61333) occur after successful creation (e.g. waiter failures due to missing DescribeClusters permissions). This change adds best-effort cleanup when post-create steps fail by attempting to delete the cluster that was successfully created. Cleanup errors are logged but do not mask the original exception. This mode is opt-in by default. Tests cover successful cleanup on waiter failure and ensure cleanup failures do not override the original error. Co-authored-by: Sameer Mesiah <smesiah971@gmail.com>

SameerMesiah97 requested a review from o-nikolas as a code owner February 1, 2026 18:51

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Feb 1, 2026

eladkal reviewed Feb 2, 2026

View reviewed changes

providers/amazon/src/airflow/providers/amazon/aws/operators/redshift_cluster.py Show resolved Hide resolved

eladkal requested a review from vincbeck February 5, 2026 18:39

o-nikolas approved these changes Feb 10, 2026

View reviewed changes

providers/amazon/src/airflow/providers/amazon/aws/operators/redshift_cluster.py Show resolved Hide resolved

potiuk merged commit 39b914b into apache:main Feb 15, 2026
90 checks passed

potiuk linked an issue Feb 16, 2026 that may be closed by this pull request

RedshiftCreateClusterOperator leaks Redshift cluster on failure with partial IAM permissions #61974

Closed

2 tasks

SameerMesiah97 mentioned this pull request Feb 21, 2026

GKECreateClusterOperator may leak clusters on PermissionDenied during operation polling #62301

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

RedshiftCreateClusterOperator could leave clusters running after failure#61333

RedshiftCreateClusterOperator could leave clusters running after failure#61333
potiuk merged 1 commit intoapache:mainfrom
SameerMesiah97:61324-RedshiftCreateClusterOperator-Cleanup

SameerMesiah97 commented Feb 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

potiuk commented Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

SameerMesiah97 commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

potiuk commented Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SameerMesiah97 commented Feb 1, 2026 •

edited

Loading