Instance drain timeout for EMR cluster destroy is too low #7957

salikov1809 · 2019-03-15T15:53:40Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Although I am not running the latest version - source code check confirms the hardcoded timeout value is the same as in the latest version

Terraform v0.11.10

provider.aws v1.59.0

Affected Resource(s)

aws_emr_cluster

Expected Behavior

terraform destroy succeeds

Actual Behavior

Time needed to destroy our EMR cluster resources is very close to 10 minutes. Occassionally it takes slightly more than 10 minutes in which case destroy fails due to the instance drain timeout (error waiting for EMR Cluster (%s) Instances to drain) which is currently hardcoded to 10 minutes.

Steps to Reproduce

terraform destroy

Important Factoids

References

https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_emr_cluster.go#L991

The text was updated successfully, but these errors were encountered:

nywilken · 2019-03-15T22:00:58Z

Hi @salikov1809 it looks like this may be a duplicate of #3465. If that’s true we can close this issue and tack progress there. If you believe it is a different issue please provide a redacted configuration to help us reproduce the issue.

salikov1809 · 2019-03-18T10:57:20Z

Hi,

I believe this is a different issue. In our case Terraform does wait 10 minutes (the hardcoded timeout value), but 10 minutes is not enough to destroy our EMR clusters (it often takes 11 minutes or more to destroy them).

To reproduce this issue you would need an EMR cluster with a destroy time longer than 10 minutes. Unfortunately I cannot provide you with full details of our configuration at the moment.

For a quick fix we've just replaced the hardcoded value of 10 minutes with a higher one in the source code and recompiled the provider.

Having this timeout value configurable (like the aws_instance resource) would solve the issue.

edwardbartholomew · 2019-04-06T13:10:36Z

Agree this is a different than #3465 where it was reported related resources were attempted to be removed before EMR is actually deleted. This issue is that it sometimes takes longer than ten minutes to destroy a cluster. If this value can't be increased by default, the timeouts configuration suggested by @salikov1809 would be a good compromise.

       module.test.aws_emr_cluster.cluster: Still destroying... (ID: j-XXXXXXXXXXXX, 9m50s elapsed)
       module.test.aws_emr_cluster.cluster: Still destroying... (ID: j-XXXXXXXXXXXX, 10m0s elapsed)

       Error: Error applying plan:

       1 error(s) occurred:

       * module.test.aws_emr_cluster.cluster (destroy): 1 error(s) occurred:

       * aws_emr_cluster.cluster: error waiting for EMR Cluster (j-XXXXXXXXXXXX) Instances to drain

bflad · 2019-04-24T21:58:06Z

Hi folks 👋 We've merged an increase in the timeout from 10 minutes to 20 minutes in the aws_emr_cluster resource as noted by the AWS CLI terminate-clusters documentation. We typically prefer to use hardcoded timeouts when possible to prevent operators from potentially masking unexpected errors that are being retried without feedback. Should we still run into issues with the new increased timeout we can re-evaluate that position for this particular resource. A future version of the Terraform Provider SDK should allow any Terraform provider resource to return helpful diagnostic information during these retry/timeout loops and alleviate any particular error masking issue. 👍

This timeout increase will release with version 2.8.0 of the Terraform AWS Provider, likely later this week.

nywilken · 2019-04-27T01:41:06Z

This has been released in version 2.8.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

ghost · 2020-03-30T17:30:24Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

nywilken added bug Addresses a defect in current functionality. service/emr Issues and PRs that pertain to the emr service. labels Mar 15, 2019

edwardbartholomew mentioned this issue Apr 24, 2019

Increase EMR Cluster deletion timeout to 20 minutes #8428

Merged

bflad added this to the v2.8.0 milestone Apr 24, 2019

bflad closed this as completed in #8428 Apr 24, 2019

ghost locked and limited conversation to collaborators Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instance drain timeout for EMR cluster destroy is too low #7957

Instance drain timeout for EMR cluster destroy is too low #7957

salikov1809 commented Mar 15, 2019

nywilken commented Mar 15, 2019

salikov1809 commented Mar 18, 2019

edwardbartholomew commented Apr 6, 2019

bflad commented Apr 24, 2019

nywilken commented Apr 27, 2019 •

edited

Loading

ghost commented Mar 30, 2020

Instance drain timeout for EMR cluster destroy is too low #7957

Instance drain timeout for EMR cluster destroy is too low #7957

Comments

salikov1809 commented Mar 15, 2019

Community Note

Terraform Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

nywilken commented Mar 15, 2019

salikov1809 commented Mar 18, 2019

edwardbartholomew commented Apr 6, 2019

bflad commented Apr 24, 2019

nywilken commented Apr 27, 2019 • edited Loading

ghost commented Mar 30, 2020

nywilken commented Apr 27, 2019 •

edited

Loading