Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Provider produced inconsistent result after apply" creating multiple ec2 spot requests #12679

Closed
realflash opened this issue Apr 5, 2020 · 5 comments · Fixed by #18473
Closed
Assignees
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Milestone

Comments

@realflash
Copy link
Contributor

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.24

  • provider.aws v2.56.0

Affected Resource(s)

aws_spot_instance_request

Terraform Configuration Files

resource "aws_spot_instance_request" "no_subnet_1" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

resource "aws_spot_instance_request" "no_subnet_2" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

resource "aws_spot_instance_request" "no_subnet_3" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

resource "aws_spot_instance_request" "no_subnet_4" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

resource "aws_spot_instance_request" "no_subnet_5" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

resource "aws_spot_instance_request" "no_subnet_6" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

resource "aws_spot_instance_request" "no_subnet_7" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

resource "aws_spot_instance_request" "no_subnet_8" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

resource "aws_spot_instance_request" "no_subnet_9" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

resource "aws_spot_instance_request" "no_subnet_10" {
  ami           = "ami-0ed2df11a6d41ea78"
  instance_type = "t3.nano"
  availability_zone = "eu-west-2a"
}

Debug Output

https://gist.githubusercontent.com/realflash/6d08ea50718075263a7d99b4c424dfd2/raw/49400c590bde68aaf6f35fb28c3a315a80497673/az.log

Expected Behaviour

10 spot requests created without error.

Actual Behaviour

10 spot requests created with errors implying only 5 were created. Incorrect state; subsequent plans indicate terraform want to create the 5 that errored previously.

Steps to Reproduce

  1. terraform apply
@ghost ghost added the service/ec2 Issues and PRs that pertain to the ec2 service. label Apr 5, 2020
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Apr 5, 2020
@ewbankkit
Copy link
Contributor

Maybe this is related to #12671?

@stefansundin
Copy link
Contributor

@ewbankkit I am no expert, but I suspect it is not.

@realflash Have you run this before without issues? Is this a new problem? I think that would be an important data point.

I wonder if it is just because so many spot instance requests are created too quickly. Perhaps if wait_for_fulfillment = true was added to all of them (and maybe terraform apply -parallelism=1 was used), it may work better.

The log seems to indicate that the read immediately after the creation returns InvalidSpotInstanceRequestID.NotFound. Maybe the API is not perfectly read-after-write consistent and we need to retry for at least a few seconds?

@realflash
Copy link
Contributor Author

realflash commented Apr 12, 2020

What I was doing at the time was trying to get at the troublesome attribute for #12680

I was seeing the behaviour described in there (that the instance deployed was not in the requested region), and I thought there might be some kind of load balancing going on whereby AWS was creating the spot instance in the least loaded zone. So I thought I would create ten to see how many would end up in the requested zone. In the end, none did. This was the output from my first run of ten. My second run of ten, all of them completed and I did not get this error. So it is certainly not occurring every time.

I then did many more similar runs, knocking attributes out of my resource definition as I went to find the one causing the problem. I dropped down to five instances and then three as I became more confident that it was an attribute problem and not some kind of load balancing. Then on roughly run 7 (number of instances 3), the error occurred again. Run 8 went fine.

Given an occurrence rate of under 1 in 10, I'd say your theory is reasonable. Given a count of only three servers was enough to produce it, it seems likely to be a problem others will hit.

@bflad bflad added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Mar 29, 2021
@bflad bflad self-assigned this Mar 29, 2021
bflad added a commit that referenced this issue Mar 29, 2021
… consistency

Reference: #12679
Reference: #16796

Output from acceptance testing in AWS Commercial:

```
--- PASS: TestAccAWSSpotInstanceRequest_basic (65.29s)
--- PASS: TestAccAWSSpotInstanceRequest_vpc (75.75s)
--- PASS: TestAccAWSSpotInstanceRequest_InterruptStop (96.45s)
--- PASS: TestAccAWSSpotInstanceRequest_withBlockDuration (97.68s)
--- PASS: TestAccAWSSpotInstanceRequest_withLaunchGroup (108.30s)
--- PASS: TestAccAWSSpotInstanceRequest_SubnetAndSGAndPublicIpAddress (119.75s)
--- PASS: TestAccAWSSpotInstanceRequest_InterruptHibernate (125.90s)
--- PASS: TestAccAWSSpotInstanceRequest_NetworkInterfaceAttributes (140.51s)
--- PASS: TestAccAWSSpotInstanceRequest_tags (165.36s)
--- PASS: TestAccAWSSpotInstanceRequest_getPasswordData (187.44s)
--- PASS: TestAccAWSSpotInstanceRequest_disappears (320.26s)
--- PASS: TestAccAWSSpotInstanceRequest_withoutSpotPrice (328.46s)
--- PASS: TestAccAWSSpotInstanceRequest_validUntil (338.82s)
```

Output from acceptance testing in AWS GovCloud (US):

```
--- PASS: TestAccAWSSpotInstanceRequest_withoutSpotPrice (66.13s)
--- PASS: TestAccAWSSpotInstanceRequest_basic (66.24s)
--- PASS: TestAccAWSSpotInstanceRequest_withBlockDuration (66.32s)
--- PASS: TestAccAWSSpotInstanceRequest_InterruptHibernate (74.19s)
--- PASS: TestAccAWSSpotInstanceRequest_vpc (95.40s)
--- PASS: TestAccAWSSpotInstanceRequest_withLaunchGroup (107.94s)
--- PASS: TestAccAWSSpotInstanceRequest_validUntil (107.98s)
--- PASS: TestAccAWSSpotInstanceRequest_InterruptStop (127.16s)
--- PASS: TestAccAWSSpotInstanceRequest_tags (155.38s)
--- PASS: TestAccAWSSpotInstanceRequest_SubnetAndSGAndPublicIpAddress (160.34s)
--- PASS: TestAccAWSSpotInstanceRequest_NetworkInterfaceAttributes (161.00s)
--- PASS: TestAccAWSSpotInstanceRequest_getPasswordData (209.51s)
--- PASS: TestAccAWSSpotInstanceRequest_disappears (332.58s)
```
bflad added a commit that referenced this issue Apr 1, 2021
… consistency (#18473)

* resource/aws_spot_instance_request: Handle read-after-create eventual consistency

Reference: #12679
Reference: #16796

Output from acceptance testing in AWS Commercial:

```
--- PASS: TestAccAWSSpotInstanceRequest_basic (65.29s)
--- PASS: TestAccAWSSpotInstanceRequest_vpc (75.75s)
--- PASS: TestAccAWSSpotInstanceRequest_InterruptStop (96.45s)
--- PASS: TestAccAWSSpotInstanceRequest_withBlockDuration (97.68s)
--- PASS: TestAccAWSSpotInstanceRequest_withLaunchGroup (108.30s)
--- PASS: TestAccAWSSpotInstanceRequest_SubnetAndSGAndPublicIpAddress (119.75s)
--- PASS: TestAccAWSSpotInstanceRequest_InterruptHibernate (125.90s)
--- PASS: TestAccAWSSpotInstanceRequest_NetworkInterfaceAttributes (140.51s)
--- PASS: TestAccAWSSpotInstanceRequest_tags (165.36s)
--- PASS: TestAccAWSSpotInstanceRequest_getPasswordData (187.44s)
--- PASS: TestAccAWSSpotInstanceRequest_disappears (320.26s)
--- PASS: TestAccAWSSpotInstanceRequest_withoutSpotPrice (328.46s)
--- PASS: TestAccAWSSpotInstanceRequest_validUntil (338.82s)
```

Output from acceptance testing in AWS GovCloud (US):

```
--- PASS: TestAccAWSSpotInstanceRequest_withoutSpotPrice (66.13s)
--- PASS: TestAccAWSSpotInstanceRequest_basic (66.24s)
--- PASS: TestAccAWSSpotInstanceRequest_withBlockDuration (66.32s)
--- PASS: TestAccAWSSpotInstanceRequest_InterruptHibernate (74.19s)
--- PASS: TestAccAWSSpotInstanceRequest_vpc (95.40s)
--- PASS: TestAccAWSSpotInstanceRequest_withLaunchGroup (107.94s)
--- PASS: TestAccAWSSpotInstanceRequest_validUntil (107.98s)
--- PASS: TestAccAWSSpotInstanceRequest_InterruptStop (127.16s)
--- PASS: TestAccAWSSpotInstanceRequest_tags (155.38s)
--- PASS: TestAccAWSSpotInstanceRequest_SubnetAndSGAndPublicIpAddress (160.34s)
--- PASS: TestAccAWSSpotInstanceRequest_NetworkInterfaceAttributes (161.00s)
--- PASS: TestAccAWSSpotInstanceRequest_getPasswordData (209.51s)
--- PASS: TestAccAWSSpotInstanceRequest_disappears (332.58s)
```

* Update CHANGELOG for #18473
@github-actions github-actions bot added this to the v3.35.0 milestone Apr 1, 2021
@ghost
Copy link

ghost commented Apr 1, 2021

This has been released in version 3.35.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@ghost
Copy link

ghost commented May 1, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators May 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Projects
None yet
4 participants