RedshiftCreateClusterOperator leaks Redshift cluster on failure with partial IAM permissions


### Discussed in https://github.com/apache/airflow/discussions/61930

<div type='discussions-op-text'>

<sup>Originally posted by **SameerMesiah97** February  1, 2026</sup>
### Apache Airflow Provider(s)

amazon

### Versions of Apache Airflow Providers

`apache-airflow-providers-amazon>=9.21.0rc1`

### Apache Airflow version

main

### Operating System

Debian GNU/Linux 12 (bookworm)

### Deployment

Other

### Deployment details

_No response_

### What happened

When using `RedshiftCreateClusterOperator`, a Redshift cluster may be successfully created even when the AWS execution role has **partial Redshift permissions**, for example lacking `redshift:DescribeClusters`.

In this scenario, the operator successfully calls `create_cluster` and the Redshift cluster begins provisioning in AWS. However, subsequent steps—such as waiting for the cluster to become available when `wait_for_completion=True`—fail due to insufficient permissions.

The Airflow task then fails, but the Redshift cluster continues provisioning or remains active in AWS, resulting in leaked infrastructure and ongoing cost.

This can occur, for example, when the execution role allows `redshift:CreateCluster` but explicitly denies `redshift:DescribeClusters`, which is required by the waiter used to monitor cluster availability.

### What you think should happen instead

If the operator fails after successfully initiating cluster creation (for example due to missing `DescribeClusters` or other follow-up permissions), it should make a **best-effort attempt to clean up** the partially created resource by deleting the cluster.

Cleanup should be attempted opportunistically (i.e. only if the cluster identifier is known and the necessary permissions are available), and failure to clean up should **not mask or replace the original exception**.

### How to reproduce

1. Create an IAM role that allows `redshift:CreateCluster` but denies `redshift:DescribeClusters`.

2. Configure an AWS connection in Airflow using this role.
   (The connection ID `aws_test_conn` is used for this reproduction.)

3. Ensure a valid Redshift cluster subnet group exists.
   (For example: `example-subnet-group`.)

4. Use the following DAG:

```python
from datetime import datetime

from airflow import DAG
from airflow.providers.amazon.aws.operators.redshift_cluster import (
    RedshiftCreateClusterOperator,
)

with DAG(
    dag_id="redshift_partial_auth_cluster_leak_repro",
    start_date=datetime(2025, 1, 1),
    schedule=None,
    catchup=False,
) as dag:
    create_cluster = RedshiftCreateClusterOperator(
        task_id="create_redshift_cluster",
        aws_conn_id="aws_test_conn",
        cluster_identifier="leaky-redshift-cluster",
        node_type="ra3.large",
        master_username="example",
        master_user_password="example",
        cluster_type="single-node",
        cluster_subnet_group_name="example-subnet-group",
        wait_for_completion=True,  # triggers DescribeClusters via waiter
    )
```

5. Trigger the DAG.

 **Observed Behaviour**

The task fails due to missing `redshift:DescribeClusters` permissions, but the Redshift cluster is successfully created and remains active in AWS. The cluster is not cleaned up automatically and continues incurring cost.

### Anything else

Redshift clusters begin incurring cost immediately once creation starts, even if the cluster never reaches an `available` state. When post-creation failures occur, leaked clusters can therefore result in unexpected and ongoing cost.

This issue follows a broader pattern across AWS operators where resources are created successfully but not cleaned up when subsequent steps fail. Apache Airflow has been introducing best-effort cleanup behavior to address this class of problems consistently across providers.

### Are you willing to submit PR?

- [x] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RedshiftCreateClusterOperator leaks Redshift cluster on failure with partial IAM permissions #61974

Discussed in #61930

Apache Airflow Provider(s)

Versions of Apache Airflow Providers

Apache Airflow version

Operating System

Deployment

Deployment details

What happened

What you think should happen instead

How to reproduce

Anything else

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RedshiftCreateClusterOperator leaks Redshift cluster on failure with partial IAM permissions #61974

Description

Discussed in #61930

Apache Airflow Provider(s)

Versions of Apache Airflow Providers

Apache Airflow version

Operating System

Deployment

Deployment details

What happened

What you think should happen instead

How to reproduce

Anything else

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions