Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance drain timeout for EMR cluster destroy is too low #7957

Closed
salikov1809 opened this issue Mar 15, 2019 · 6 comments · Fixed by #8428
Closed

Instance drain timeout for EMR cluster destroy is too low #7957

salikov1809 opened this issue Mar 15, 2019 · 6 comments · Fixed by #8428
Labels
bug Addresses a defect in current functionality. service/emr Issues and PRs that pertain to the emr service.
Milestone

Comments

@salikov1809
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Although I am not running the latest version - source code check confirms the hardcoded timeout value is the same as in the latest version

Terraform v0.11.10

  • provider.aws v1.59.0

Affected Resource(s)

  • aws_emr_cluster

Expected Behavior

terraform destroy succeeds

Actual Behavior

Time needed to destroy our EMR cluster resources is very close to 10 minutes. Occassionally it takes slightly more than 10 minutes in which case destroy fails due to the instance drain timeout (error waiting for EMR Cluster (%s) Instances to drain) which is currently hardcoded to 10 minutes.

Steps to Reproduce

  1. terraform destroy

Important Factoids

References

@nywilken
Copy link
Contributor

Hi @salikov1809 it looks like this may be a duplicate of #3465. If that’s true we can close this issue and tack progress there. If you believe it is a different issue please provide a redacted configuration to help us reproduce the issue.

@nywilken nywilken added bug Addresses a defect in current functionality. service/emr Issues and PRs that pertain to the emr service. labels Mar 15, 2019
@salikov1809
Copy link
Author

Hi,

I believe this is a different issue. In our case Terraform does wait 10 minutes (the hardcoded timeout value), but 10 minutes is not enough to destroy our EMR clusters (it often takes 11 minutes or more to destroy them).

To reproduce this issue you would need an EMR cluster with a destroy time longer than 10 minutes. Unfortunately I cannot provide you with full details of our configuration at the moment.

For a quick fix we've just replaced the hardcoded value of 10 minutes with a higher one in the source code and recompiled the provider.

Having this timeout value configurable (like the aws_instance resource) would solve the issue.

@edwardbartholomew
Copy link
Contributor

Agree this is a different than #3465 where it was reported related resources were attempted to be removed before EMR is actually deleted. This issue is that it sometimes takes longer than ten minutes to destroy a cluster. If this value can't be increased by default, the timeouts configuration suggested by @salikov1809 would be a good compromise.

       module.test.aws_emr_cluster.cluster: Still destroying... (ID: j-XXXXXXXXXXXX, 9m50s elapsed)
       module.test.aws_emr_cluster.cluster: Still destroying... (ID: j-XXXXXXXXXXXX, 10m0s elapsed)

       Error: Error applying plan:

       1 error(s) occurred:

       * module.test.aws_emr_cluster.cluster (destroy): 1 error(s) occurred:

       * aws_emr_cluster.cluster: error waiting for EMR Cluster (j-XXXXXXXXXXXX) Instances to drain

@bflad
Copy link
Contributor

bflad commented Apr 24, 2019

Hi folks 👋 We've merged an increase in the timeout from 10 minutes to 20 minutes in the aws_emr_cluster resource as noted by the AWS CLI terminate-clusters documentation. We typically prefer to use hardcoded timeouts when possible to prevent operators from potentially masking unexpected errors that are being retried without feedback. Should we still run into issues with the new increased timeout we can re-evaluate that position for this particular resource. A future version of the Terraform Provider SDK should allow any Terraform provider resource to return helpful diagnostic information during these retry/timeout loops and alleviate any particular error masking issue. 👍

This timeout increase will release with version 2.8.0 of the Terraform AWS Provider, likely later this week.

@nywilken
Copy link
Contributor

nywilken commented Apr 27, 2019

This has been released in version 2.8.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@ghost
Copy link

ghost commented Mar 30, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/emr Issues and PRs that pertain to the emr service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants