-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Discussed in #61930
Originally posted by SameerMesiah97 February 1, 2026
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon>=9.21.0rc1
Apache Airflow version
main
Operating System
Debian GNU/Linux 12 (bookworm)
Deployment
Other
Deployment details
No response
What happened
When using RedshiftCreateClusterOperator, a Redshift cluster may be successfully created even when the AWS execution role has partial Redshift permissions, for example lacking redshift:DescribeClusters.
In this scenario, the operator successfully calls create_cluster and the Redshift cluster begins provisioning in AWS. However, subsequent steps—such as waiting for the cluster to become available when wait_for_completion=True—fail due to insufficient permissions.
The Airflow task then fails, but the Redshift cluster continues provisioning or remains active in AWS, resulting in leaked infrastructure and ongoing cost.
This can occur, for example, when the execution role allows redshift:CreateCluster but explicitly denies redshift:DescribeClusters, which is required by the waiter used to monitor cluster availability.
What you think should happen instead
If the operator fails after successfully initiating cluster creation (for example due to missing DescribeClusters or other follow-up permissions), it should make a best-effort attempt to clean up the partially created resource by deleting the cluster.
Cleanup should be attempted opportunistically (i.e. only if the cluster identifier is known and the necessary permissions are available), and failure to clean up should not mask or replace the original exception.
How to reproduce
-
Create an IAM role that allows
redshift:CreateClusterbut deniesredshift:DescribeClusters. -
Configure an AWS connection in Airflow using this role.
(The connection IDaws_test_connis used for this reproduction.) -
Ensure a valid Redshift cluster subnet group exists.
(For example:example-subnet-group.) -
Use the following DAG:
from datetime import datetime
from airflow import DAG
from airflow.providers.amazon.aws.operators.redshift_cluster import (
RedshiftCreateClusterOperator,
)
with DAG(
dag_id="redshift_partial_auth_cluster_leak_repro",
start_date=datetime(2025, 1, 1),
schedule=None,
catchup=False,
) as dag:
create_cluster = RedshiftCreateClusterOperator(
task_id="create_redshift_cluster",
aws_conn_id="aws_test_conn",
cluster_identifier="leaky-redshift-cluster",
node_type="ra3.large",
master_username="example",
master_user_password="example",
cluster_type="single-node",
cluster_subnet_group_name="example-subnet-group",
wait_for_completion=True, # triggers DescribeClusters via waiter
)- Trigger the DAG.
Observed Behaviour
The task fails due to missing redshift:DescribeClusters permissions, but the Redshift cluster is successfully created and remains active in AWS. The cluster is not cleaned up automatically and continues incurring cost.
Anything else
Redshift clusters begin incurring cost immediately once creation starts, even if the cluster never reaches an available state. When post-creation failures occur, leaked clusters can therefore result in unexpected and ongoing cost.
This issue follows a broader pattern across AWS operators where resources are created successfully but not cleaned up when subsequent steps fail. Apache Airflow has been introducing best-effort cleanup behavior to address this class of problems consistently across providers.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct