Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, if a custom resource failed to sync with Consul, we used the
default retry backoff. This was an exponential backoff starting at 5ms
and maxing out at 1000s (16m).
This backoff was a poor UX for our common error case where one config
entry cannot be applied due to a prerequisite, for example an ingress
gateway entry cannot be applied until the protocol is set to http. The
usual workflow to resolve this would be to look up the error, figure out
the correct ServiceDefaults/ProxyDefaults, and then apply that resource.
Once applied, the user needs to wait fo the ingress gateway (in this
example) resource to be retried. With the default backoff config,
because the user will have taken on the order of minutes to figure out
the correct config, the exponential backoff will now be upwards of
five minutes. The user will have to wait for a long time for the ingress
gateway resource to be retried.
This PR changes the backoff to start out at 200ms and max out at 5s.
This fits our use-case better because the user will have to wait at max
5ms, usually if there's an error, retrying within milliseconds does nothing, so
waiting 200ms to start is fine, and finally, Consul servers can accept
tens of thousands of writes per second so even if there are a ton of
resources retrying every 5s, it won't be an issue.
Fixes #587
How I've tested this PR:
Apply the igw, then the defaults with a small gap in-between, then wait and see how long it takes to retry:
See that it takes 80 seconds to re-sync. If we slept for longer then the retry would be even longer, up to 16 minutes!
Now using
ghcr.io/lkysow/consul-k8s-dev:oct17
retry the same:It re-syncs in 5s!
How I expect reviewers to test this PR:
Checklist: