-
Notifications
You must be signed in to change notification settings - Fork 16.5k
Description
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==9.20.0
Apache Airflow version
main
Operating System
Debian GNU/Linux 12 (bookworm)
Deployment
Other
Deployment details
No response
What happened
When using EC2CreateInstanceOperator, an EC2 instance may be successfully created even when the task execution role has partial EC2 permissions, for example lacking ec2:DescribeInstances.
In this scenario, the operator successfully calls RunInstances and creates the EC2 instance. However, subsequent calls (such as describing or waiting for the instance when wait_for_completion=True) fail due to insufficient permissions. The task then fails, but the EC2 instance continues to exist and remains running in AWS, resulting in leaked infrastructure.
What you think should happen instead
If the operator fails after successfully creating an EC2 instance (for example due to missing DescribeInstances or other follow-up permissions), it should make a best-effort attempt to clean up the partially created resource by terminating the instance.
Cleanup should be attempted opportunistically (i.e. only if the instance ID is known and the necessary permissions are available), and failure to clean up should not mask or replace the original exception.
How to reproduce
- Create an IAM role that allows
ec2:RunInstancesbut deniesec2:DescribeInstances. - Configure an AWS connection in Airflow using this role.
- Use the following DAG:
from datetime import datetime
from airflow import DAG
from airflow.providers.amazon.aws.operators.ec2 import EC2CreateInstanceOperator
with DAG(
dag_id="ec2_partial_auth_leak_repro",
start_date=datetime(2025, 1, 1),
schedule=None,
catchup=False,
) as dag:
create_instance = EC2CreateInstanceOperator(
task_id="create_instance",
aws_conn_id="aws_test_conn",
image_id="ami-xxxxxxxxxxxxxxxxx",
min_count=1,
max_count=1,
config={
"SubnetId": "subnet-xxxxxxxxxxxxxxxxx", # public subnet
"SecurityGroupIds": ["sg-xxxxxxxxxxxxxxxxx"],
"InstanceType": "t3.micro",
},
wait_for_completion=True, # triggers DescribeInstances via waiter
)
- Trigger the DAG.
Expected Result
The task fails due to missing DescribeInstances permissions but the EC2 instance remains running in AWS and is not terminated automatically.
Anything else
This behavior can be surprising and potentially costly, as infrastructure is created even though the Airflow task fails. Other Airflow operators that manage external resources typically attempt best-effort cleanup on failure to avoid leaking infrastructure.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct