You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While testing our latest RC candidate on AWS, the deployment got stuck at the S3 bucket creation for the terraform-state as seen below:
While inspecting the CloudTrail logs, I noticed that AWS was aborting the CreateBucket requests due to a conflicting conditional operation currently in progress:
{
"eventSource": "s3.amazonaws.com",
"eventName": "CreateBucket",
"awsRegion": "us-east-1",
"sourceIPAddress": "***",
"errorCode": "OperationAborted",
"errorMessage": "A conflicting conditional operation is currently in progress against this resource. Please try again.",
"requestParameters": {
"bucketName": "***-dev-terraform-state",
"Host": "***-dev-terraform-state.s3.amazonaws.com",
"x-amz-acl": "private"
},
}
I am still not entirely sure about the case, but I think it is related to the order of operations performed by the AWS Terraform provider when creating the bucket and applying the encryption and security visibility block changes. I, unfortunately, didn't have the trace logs on during that time, so I couldn't thoroughly inspect the concurrent requests being made by the AWS provider during that time, but these are the ones that happened during apply:
PutBucketTagging (part of aws_s3_bucket)
CreateBucket (part of aws_s3_bucket)
PutBucketVersioning (part of aws_s3_bucket)
PutBucketPublicAccessBlock
PutBucketEncryption
We already enforce some dependency between these resources in our TF code:
graph TB
A[aws_kms_key]
B[aws_s3_bucket]
C[aws_s3_bucket_server_side_encryption_configuration]
D[aws_s3_bucket_public_access_block]
A --> B
B --> C
B --> D
Loading
However, nothing seems to prevent s3_bucket_server_side_encryption from not running concurrently or in incorrect order from s3_bucket_public_access_block.
We have two possible solutions in my opinion:
As this seems to be fixed upstream, I suggest we test upgrading the AWS provider to the version with the fix (ideal) v3.67.0
Add an explicit depends_on to the resources above to force the API requests to happen sequentially.
Expected behavior
Deployment succeeds without blocking sections or requiring the user to redeploy again for the sequence to self-heal. It succeeds without any blocking sections or the user requiring a redeployment again. The deployment succeeds without any blocking sections or without the user needing to redeploy again for the sequence to self-heal.
OS and architecture in which you are running Nebari
Linux
How to Reproduce the problem?
This can be a bit tricky to reproduce as it depends on the order in which both will apply the resources as well as the timing of the requests sent by the AWS provider. But in theory, just creating a deployment from scratch on AWS should be enough to reproduce. and
Describe the bug
While testing our latest RC candidate on AWS, the deployment got stuck at the S3 bucket creation for the
terraform-state
as seen below:While inspecting the CloudTrail logs, I noticed that AWS was aborting the CreateBucket requests due to a conflicting conditional operation currently in progress:
I am still not entirely sure about the case, but I think it is related to the order of operations performed by the AWS Terraform provider when creating the bucket and applying the encryption and security visibility block changes. I, unfortunately, didn't have the trace logs on during that time, so I couldn't thoroughly inspect the concurrent requests being made by the AWS provider during that time, but these are the ones that happened during apply:
PutBucketTagging
(part of aws_s3_bucket)CreateBucket
(part of aws_s3_bucket)PutBucketVersioning
(part of aws_s3_bucket)PutBucketPublicAccessBlock
PutBucketEncryption
We already enforce some dependency between these resources in our TF code:
However, nothing seems to prevent
s3_bucket_server_side_encryption
from not running concurrently or in incorrect order froms3_bucket_public_access_block
.We have two possible solutions in my opinion:
depends_on
to the resources above to force the API requests to happen sequentially.Expected behavior
Deployment succeeds without blocking sections or requiring the user to redeploy again for the sequence to self-heal. It succeeds without any blocking sections or the user requiring a redeployment again. The deployment succeeds without any blocking sections or without the user needing to redeploy again for the sequence to self-heal.
OS and architecture in which you are running Nebari
Linux
How to Reproduce the problem?
This can be a bit tricky to reproduce as it depends on the order in which both will apply the resources as well as the timing of the requests sent by the AWS provider. But in theory, just creating a deployment from scratch on AWS should be enough to reproduce. and
Command output
No response
Versions and dependencies used.
nebari
2024.7.1rc3
Compute environment
AWS
Integrations
No response
Anything else?
Similar behavior and in-depth discussion on a similar behavior can be found here: hashicorp/terraform-provider-aws#7628.
This is also an interesting one, hashicorp/terraform-provider-aws#14078:
The text was updated successfully, but these errors were encountered: