improve resource.WaitForState and add refreshGracePeriod #13778

jbardin · 2017-04-19T18:37:11Z

The Refresh goroutine in WaitForState was never being canceled, and could end up running until the entire process exited. Because of this situation, successful calls to refresh could happen long after the timeout is reached. This can be a serious problem when the Refresh call has side effects that need to be recorded (as is often the case when called through resource.Retry).

Start by making the goroutine properly cancellable, returning immediately during the wait period.

Once the refresh goroutine can be cancelled, the case where there is a Refresh still in-flight needs to be taken care of. Because Refresh can't be cancelled directly, all we can do is wait and hope it returns in a reasonable amount of time.

Add a grace period after the timeout elapses, to wait for the function to return.

Fixes #13617

Make sure that we can cancel the WaitForState refresh loop when reaching a timeout, otherwise it may run indefinitely. There's no need to try and store and read the Result concurrently, just pass the value over a channel.

This test unfortunately relies on the timing of the loops in WaitForState, and the text of the error message. Adjust the timing so the timeout isn't an even multiple of the poll interval, and make sure we reach a minimum number of retries.

apparentlymart

I'm a bit hesitant here just because it seems like this bit of code has lots of complex interactions and my mental model of how things fit together here is still lacking, but it looks like the code does what the comment says it should, and in principle what you described sounds reasonable, so I think that's as good as I'm going to be able to get here... 👍

Refresh calls may have side effects that need to be recorded if it succeeds, especially common when when WaitForState is called from resource.Retry. If the WaitForState timeout is reached and there is a Refresh call in-flight, wait up to refreshGracePeriod (set to 30s) for it to complete.

A couple tests require lowering the grace period to keep the test from taking the full 30s timeout. The Retry_hang test also needed to be removed from the Parallel group, becuase it modifies the global refreshGracePeriod variable.

ghost · 2020-04-13T02:32:45Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

jbardin added 2 commits April 19, 2017 10:41

don't leave WaitForState goroutine running

af5e22c

Make sure that we can cancel the WaitForState refresh loop when reaching a timeout, otherwise it may run indefinitely. There's no need to try and store and read the Result concurrently, just pass the value over a channel.

adjust the inconsistent_negative test to match

6601b9b

This test unfortunately relies on the timing of the loops in WaitForState, and the text of the error message. Adjust the timing so the timeout isn't an even multiple of the poll interval, and make sure we reach a minimum number of retries.

jbardin added bug core labels Apr 19, 2017

jbardin requested a review from apparentlymart April 19, 2017 18:37

apparentlymart approved these changes Apr 19, 2017

View reviewed changes

jbardin force-pushed the jbardin/GH-13617 branch from a0fe0cd to 14bea66 Compare April 19, 2017 22:06

jbardin added 4 commits April 19, 2017 18:07

fix tests affected by refreshGracePeriod

eb4b459

A couple tests require lowering the grace period to keep the test from taking the full 30s timeout. The Retry_hang test also needed to be removed from the Parallel group, becuase it modifies the global refreshGracePeriod variable.

add test for proper cancelation

14bea66

lint errors

4c3a053

jbardin merged commit f5cda34 into master Apr 19, 2017

jbardin deleted the jbardin/GH-13617 branch April 19, 2017 22:23

ghost locked and limited conversation to collaborators Apr 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve resource.WaitForState and add refreshGracePeriod #13778

improve resource.WaitForState and add refreshGracePeriod #13778

jbardin commented Apr 19, 2017

apparentlymart left a comment

ghost commented Apr 13, 2020

improve resource.WaitForState and add refreshGracePeriod #13778

improve resource.WaitForState and add refreshGracePeriod #13778

Conversation

jbardin commented Apr 19, 2017

apparentlymart left a comment

Choose a reason for hiding this comment

ghost commented Apr 13, 2020