-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should HC activation be delayed until needed secrets are available? #12389
Labels
Comments
Yeah I think we should fix this somehow. cc @snowp for thoughts. |
(Sorry I think we should do (a) somehow) |
Yeah I agree, we should be sequencing the cluster warming so that we don't set/start the health checker for a cluster until its secrets have warmed. |
mpuncel
added a commit
to mpuncel/envoy
that referenced
this issue
Apr 29, 2021
This commit is a revival of envoyproxy#13516 which aimed to solve envoyproxy#12389. That PR was merged and then reverted due to a deadlock bug, which is still present in this commit. The fix will be applied in a subsequent commit in order to make the resolution clearer to PR reviewers. Signed-off-by: Michael Puncel <mpuncel@squareup.com>
This was referenced Aug 13, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I did a cursory walk through the code in charge of activating HCs and I don't think it waits for secrets to be ready.
Furthermore, the
initial_fetch_timeout
broadly refers to the config gRPC stream becoming active (not necessarily to the actual payloads being received):https://github.com/envoyproxy/envoy/blob/master/docs/root/intro/arch_overview/operations/init.rst
So for clusters where we don't set an
initial_jitter
we hitupstream_context_secrets_not_ready
during hot restarts and we also see failed upstream health checks. I didn't test this for cold starts -- though it might be an issue there too.I think we should either:
a) not activate HCs until the needed secrets are available
or
b) document the async nature of SDS and how setting
initial_jitter
might be necessary to workaround a race between the first HCs and secrets becoming available (we could add this to https://github.com/envoyproxy/envoy/blob/master/docs/root/intro/arch_overview/operations/init.rst)
Thoughts?
cc: @mattklein123 @fishcakez
The text was updated successfully, but these errors were encountered: