Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helper/resource: restore retval of resource.Retry on timeout #5460

Merged
merged 1 commit into from
Mar 7, 2016

Conversation

phinze
Copy link
Contributor

@phinze phinze commented Mar 4, 2016

In #4700 while fixing a data race I made an incorrect assumption about
the return value of StateChangeConf, and ended up changing the behavior
in the timeout case to always return:

timeout while waiting for state to become '[success]'

When it used to capture the "most recent error" from the function
itself.

It's much more useful to see that error bubbling up, so here we revert
to pulling it out of the function as we did before, and we protect
against the data race with a good old fashioned mutex.

In #4700 while fixing a data race I made an incorrect assumption about
the return value of StateChangeConf, and ended up changing the behavior
in the timeout case to always return:

> timeout while waiting for state to become '[success]'

When it used to capture the "most recent error" from the function
itself.

It's much more useful to see that error bubbling up, so here we revert
to pulling it out of the function as we did before, and we protect
against the data race with a good old fashioned mutex.
@stack72
Copy link
Contributor

stack72 commented Mar 5, 2016

@phinze you sir are awesome! This works a treat :)

BEFORE

terraform apply
aws_ecs_cluster.default: Creating...
  name: "" => "terraformecstestcluster"
aws_ecr_repository.ingest-repo: Creating...
  arn:         "" => "<computed>"
  name:        "" => "repo"
  registry_id: "" => "<computed>"
aws_ecs_cluster.default: Creation complete
aws_ecr_repository.ingest-repo: Provisioning with 'local-exec'...
aws_ecr_repository.ingest-repo (local-exec): Executing: /bin/sh -c "provision/deploy-container.sh"
aws_ecr_repository.ingest-repo: Creation complete
aws_ecs_task_definition.ingest-task: Creating...
  arn:                   "" => "<computed>"
  container_definitions: "" => "8db1f19cc94bc9ff682c8f2d8d370ebb8c4c1732"
  family:                "" => "task"
  revision:              "" => "<computed>"
aws_ecs_task_definition.ingest-task: Creation complete
aws_ecs_service.ingest-service: Creating...
  cluster:                            "" => "<computed>"
  deployment_maximum_percent:         "" => "200"
  deployment_minimum_healthy_percent: "" => "100"
  desired_count:                      "" => "1"
  name:                               "" => "ingest-service"
  task_definition:                    "" => "arn:aws:ecs:us-east-1:881237884953:task-definition/task:1"
Error applying plan:

1 error(s) occurred:

* aws_ecs_service.ingest-service: timeout while waiting for state to become '[success]'

AFTER

terraform apply
aws_ecs_cluster.default: Creating...
  name: "" => "terraformecstestcluster"
aws_ecr_repository.ingest-repo: Creating...
  arn:         "" => "<computed>"
  name:        "" => "repo"
  registry_id: "" => "<computed>"
aws_ecs_cluster.default: Creation complete
aws_ecr_repository.ingest-repo: Provisioning with 'local-exec'...
aws_ecr_repository.ingest-repo (local-exec): Executing: /bin/sh -c "provision/deploy-container.sh"
aws_ecr_repository.ingest-repo: Creation complete
aws_ecs_task_definition.ingest-task: Creating...
  arn:                   "" => "<computed>"
  container_definitions: "" => "8db1f19cc94bc9ff682c8f2d8d370ebb8c4c1732"
  family:                "" => "task"
  revision:              "" => "<computed>"
aws_ecs_task_definition.ingest-task: Creation complete
aws_ecs_service.ingest-service: Creating...
  cluster:                            "" => "terraformecstestcluster"
  deployment_maximum_percent:         "" => "200"
  deployment_minimum_healthy_percent: "" => "100"
  desired_count:                      "" => "1"
  name:                               "" => "ingest-service"
  task_definition:                    "" => "arn:aws:ecs:us-east-1:881237884953:task-definition/task:5"
aws_ecs_service.ingest-service: Creation complete

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

State path: terraform.tfstate

@stack72
Copy link
Contributor

stack72 commented Mar 5, 2016

This fixes #5454 (at least)

@brikis98
Copy link
Contributor

brikis98 commented Mar 6, 2016

Is there any workaround for the timeout while waiting for state to become '[success]' error until this is merged and released?

@jen20
Copy link
Contributor

jen20 commented Mar 6, 2016

This LGTM. The lock/unlock seems cleaner than pulling out yet another variable for return.

@phinze
Copy link
Contributor Author

phinze commented Mar 6, 2016

What I still don't quite understand is how this change seems to fix actual behavior problem.

My reading of the code suggests that an existing timeout condition might have a more useful error message shadowed by the timeout error, but it seems like instead there are real issues being caused.

I'm going to take another look to see why this might be happening.

@phinze
Copy link
Contributor Author

phinze commented Mar 7, 2016

So here is what I think is happening:

The timeout can interrupt anywhere inside the WaitForState goroutine. Including inside the Refresh function, where we might be deep in aws-sdk-go waiting for the results of an operation to finish or in any of the ContinuousTargetOccurrence accounting etc.

If the timeout fires when a happy retval has already been recorded but before we've caught the done channel close, it (prior to this PR) overrides the return value w/ the timeout error.

I still haven't grokked why it seems to happen often enough to be noticeable, but I'm willing to just merge this as is and continue to triage incoming bug reports rather than trying to fully explain this thing step by step.

@aruldeepan
Copy link

aruldeepan commented Aug 3, 2017

I am getting the "Error launching source instance: timeout while waiting for state to become 'success' (timeout: 15s)" My terraform version is "Terraform v0.9.6". Can someone please help me.?

Error while launching the instance.

@ghost
Copy link

ghost commented Apr 8, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants