Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a retry capability to lock monitors in the API client. #1457

Merged
merged 3 commits into from
Dec 1, 2015

Conversation

slackpad
Copy link
Contributor

@slackpad slackpad commented Dec 1, 2015

This adds a retry for 500 errors to the lock monitor so they don't immediately sense a problem and think they've lost the lock if there's a leader election or other brief Consul service interruption. This will address #1162 from the API client side, though we will still need to update consul lock itself.

With a Vault test setup with retries enabled I was able to observe it riding out a leader election without causing a simultaneous Vault leader election (though there were still expected Vault errors from other Consul requests that failed during the flap).

@jefferai
Copy link
Member

jefferai commented Dec 1, 2015

👍

slackpad added a commit that referenced this pull request Dec 1, 2015
Adds a retry capability to lock monitors in the API client.
@slackpad slackpad merged commit 86f20a3 into master Dec 1, 2015
@slackpad slackpad deleted the f-monitor-retries branch December 1, 2015 04:26
// counter if service is restored.
if retries > 0 && strings.Contains(err.Error(), serverError) {
time.Sleep(l.opts.MonitorRetryTime)
retries--
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the retry limit should be consecutive failures not cumulative. e.g. if we have one retry and then it works, we should reset the request count otherwise we will eventually always hit zero given enough time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@armon - agree! I had a bug initially that did this because I didn't make the retries non-blocking. As-as it should be consecutive, though. When it jumps to WAIT it always resets the retries back up to the max.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants