Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error putting S3 lifecycle: NoSuchBucket: The specified bucket does not exist #7803

Closed
vcardenas opened this issue Mar 4, 2019 · 5 comments · Fixed by #7930
Closed

Error putting S3 lifecycle: NoSuchBucket: The specified bucket does not exist #7803

vcardenas opened this issue Mar 4, 2019 · 5 comments · Fixed by #7930
Labels
bug Addresses a defect in current functionality. service/s3 Issues and PRs that pertain to the s3 service.
Milestone

Comments

@vcardenas
Copy link

vcardenas commented Mar 4, 2019

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.11.7

Affected Resource(s)

  • aws_s3_bucket

Terraform Configuration Files

The relevant portion is the lifecycle block definition, with any configuration, it just needs to define a lifecycle for the bucket.

terraform = {
  required_version = ">=0.11.2" //tried with up to 0.11.8
}

provider "aws" {
  version = "~> 2.0"    // previous versions have been tried with same results
  max_retries = 10      // naive attempt to make it work
  /*** complete config with your env ***/
}

resource "aws_s3_bucket" "mybucket" {
    bucket = "vcardenas-bucket"
    force_destroy = true
    versioning {
        enabled = true
    }
    lifecycle_rule {
        id = "tests-expire"
        enabled = true
        prefix = "myprefix/"
        expiration {
            days = 3
        }
        noncurrent_version_expiration {
            days = 3
        }
    }
}

Expected Behavior

You have an S3 bucket after terraform apply.

Actual Behavior

It fails with the following error message:
Error putting S3 lifecycle: NoSuchBucket: The specified bucket does not exist

Steps to Reproduce

  1. terraform apply
    The issue is intermittent, in order to reproduce it, init, apply and destroy in a loop.
    For ca-central-1 region my results are 40% of occurrence on a 50 iterations loop using this bash script:
#!/bin/bash

for run in {1..10} ; do
  terraform init
  terraform apply -auto-approve
  terraform destroy -auto-approve
  rm -rf .terraform/
done

References

I believe it is the same root cause as seen in #372 which seems to be solved by #GH-891 while addressing #877. This lifecycle issue, in particular, is not covered by the fix.

Even inserting periods of sleep with seconds up to a minute between iterations, results are the same.
I have not tried more than 1 minute, I don't think the solution should be "wait X before creating your bucket"

I am going to try to look at the source code to see if I can figure out something about this issue.

@bflad bflad added the service/s3 Issues and PRs that pertain to the s3 service. label Mar 4, 2019
@vcardenas
Copy link
Author

This definitely is an S3 eventual consistency issue. It is not unknown (#891 (comment)) but in that fix ( #GH-891 ) putting the lifecycle was not wrapped with the proper retry logic.

The error is thrown exactly at this point:
https://github.com/terraform-providers/terraform-provider-aws/blob/d2d46dcf98d3e834e0cb73ea5b40c1e9a93741be/aws/resource_aws_s3_bucket.go#L2125-L2133

In our case, for the ca-central-1 region, we have been experiencing this issue intermittently for more than a year when we destroy and recreate a bucket with a previously used name, it does not have to be the one just destroyed. I know that is hardly a production use case, but we need to perform this operation in some of our CI workflows.

This recreation is one step in a bigger CI pipeline and we face this issue around once every two months, which translates to 40% of the times the action is performed.

The following is a synthetic test that allows me to consistently reproduce that 40% failure in minutes using the same terraform definition shown in this issue description:
aws provider version: 2.0

#!/bin/bash

declare -A exits
touch all.out

for run in {1..100} ; do
  terraform init
  terraform apply -auto-approve | tee last.out
  exitcode=$?
  (( exits[$exitcode]++ ))
  [ $exitcode -ne 0 ] && cat last.out >> all.out
  terraform destroy -auto-approve
done

echo Exit code report
for code in "${!exits[@]}" ; do
    echo $code : ${exits[$code]} times
done

Now, I did multiple tests with a fix, just by wrapping the API call to the already used retryOnAwsCode function and it fixes the issue with a caveat: it does not disappear completely but go down to 2% occurrences.
The reason is that the retry timeout is set to 1 minute and with heavy testing (1000 iterations), it reaches consistency but after that minute, in some cases up to 1 minute and 45 seconds, presenting failures with any followup API call in the bucket creation: versioning, ACL, lifecycle, CORS, policy, etc.

I increased the timeout to 2 minutes and I got 0 occurrences after 3000 iterations on all followup API calls.

I could send a PR to fix this issue with the S3 lifecycle creation and perhaps to increase the timeout to 2 minutes for the retry operation in the retryOnAwsCode function if accepted.

Any unforeseen aspect that should be discussed about my reasoning, please.

@bflad
Copy link
Contributor

bflad commented Mar 6, 2019

@vcardenas the above assessment looks good to me. 👍

@bflad bflad added the bug Addresses a defect in current functionality. label Mar 6, 2019
@bflad bflad added this to the v2.2.0 milestone Mar 14, 2019
@bflad
Copy link
Contributor

bflad commented Mar 14, 2019

The fix for this has been merged and will release with version 2.2.0 of the Terraform AWS Provider, likely later today.

@bflad
Copy link
Contributor

bflad commented Mar 15, 2019

This has been released in version 2.2.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@ghost
Copy link

ghost commented Mar 31, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/s3 Issues and PRs that pertain to the s3 service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants