Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve resource.WaitForState and add refreshGracePeriod #13778

Merged
merged 6 commits into from
Apr 19, 2017

Conversation

jbardin
Copy link
Member

@jbardin jbardin commented Apr 19, 2017

The Refresh goroutine in WaitForState was never being canceled, and could end up running until the entire process exited. Because of this situation, successful calls to refresh could happen long after the timeout is reached. This can be a serious problem when the Refresh call has side effects that need to be recorded (as is often the case when called through resource.Retry).

Start by making the goroutine properly cancellable, returning immediately during the wait period.

Once the refresh goroutine can be cancelled, the case where there is a Refresh still in-flight needs to be taken care of. Because Refresh can't be cancelled directly, all we can do is wait and hope it returns in a reasonable amount of time.

Add a grace period after the timeout elapses, to wait for the function to return.

Fixes #13617

Make sure that we can cancel the WaitForState refresh loop when reaching
a timeout, otherwise it may run indefinitely. There's no need to try and
store and read the Result concurrently, just pass the value over a
channel.
This test unfortunately relies on the timing of the loops in
WaitForState, and the text of the error message. Adjust the timing so
the timeout isn't an even multiple of the poll interval, and make sure
we reach a minimum number of retries.
Copy link
Contributor

@apparentlymart apparentlymart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit hesitant here just because it seems like this bit of code has lots of complex interactions and my mental model of how things fit together here is still lacking, but it looks like the code does what the comment says it should, and in principle what you described sounds reasonable, so I think that's as good as I'm going to be able to get here... 👍

Refresh calls may have side effects that need to be recorded if it
succeeds, especially common when when WaitForState is called from
resource.Retry.

If the WaitForState timeout is reached and there is a Refresh call
in-flight, wait up to refreshGracePeriod (set to 30s) for it to
complete.
A couple tests require lowering the grace period to keep the test from
taking the full 30s timeout.

The Retry_hang test also needed to be removed from the Parallel group,
becuase it modifies the global refreshGracePeriod variable.
@jbardin jbardin merged commit f5cda34 into master Apr 19, 2017
@jbardin jbardin deleted the jbardin/GH-13617 branch April 19, 2017 22:23
@ghost
Copy link

ghost commented Apr 13, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in helper/resource/state.go creating EC2 instances that don't appear in tfstate
2 participants