-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status reflecting the state of Envoy programming #2021
Comments
@mattmoor contour is a lot better than Istio -- we moved away from polling nearly two years ago. The expected update delay is in the range [100ms, 600ms) |
@davecheney I'm talking about the fact that we have to probe the gateway Envoys to determine readiness because Istio doesn't surface anything (at all) in |
@mattmoor how could we change contour so you didn't have to probe? Ie, if the configuration is right, it should be programmed in envoy fast enough that a human cannot observe a delay. if the configuration is not right, then waiting or probing won't fix it. |
@davecheney it's not human observation I'm worried about. We want to be sure that things are ready when we say things our ready, and our e2e tests immediately check things when they observe that we've reported things as ready. At small scales, Istio is also quite fast, but when the system is put under load (e.g. tons of services) the programming times climb. FWIW, if I have our e2e tests retry 404s, then things get pretty far, but otherwise get a fair number of failures (though it obviously varies run to run). |
There are a few problems I see with implementing this request.
I'm not saying no to this request, but I want to explain my desire to solve your problem in a different way. |
Contour can know how many envoys have received an update (perhaps we could persist this in Envoy with tags or something?), but I'm not sure we can know how many envoys should have received the update without over specifying the envoy deployment model. |
There are metrics on the Envoy side for xDS versions:
Assuming Contour has some level of control on the versions of the xDS objects, I think it is possible to observe the convergence of Contour and Envoy on a per-pod basis. Also, I think the ACK system (see #1176) that is part of the xDS protocol could signal the Envoy state updates from Contour. As for setting a BTW, we've built an extensive detection system to alert on Envoy desynchronization as part of the investigation for #1523. It has proven valuable beyond the resolution of this issue and could prove useful to others. |
@mattmoor is this feature to solve e2e testing and wouldn't be used in production? |
I would say that it’s useful for any automated consumption of Contour where you’d ideally get an informer notification when the programming is ready. We have generalized the probing logic we wrote to solve this for Istio so that we can do something similar for establishing Contour’s readiness as well, so this particular issue isn’t blocking. |
FYI, I believe Istio is adding status now, though I haven't been tracking it too closely. |
The Contour project currently lacks enough contributors to adequately respond to all Issues. This bot triages Issues according to the following rules:
You can:
Please send feedback to the #contour channel in the Kubernetes Slack |
The Contour project currently lacks enough contributors to adequately respond to all Issues. This bot triages Issues according to the following rules:
You can:
Please send feedback to the #contour channel in the Kubernetes Slack |
tl;dr I'd like to know when the changes I have posted to
spec:
have been distributed to all the envoys.Background
In Knative, one of the early problems we had with Istio (still has this problem) is that "Ready" doesn't mean "Ready" (there is a programming delay). What we ultimately had to build (and may need to generalize for use here) was a system that would probe each gateway pod to determine whether the programming had landed.
Recommendation
There are two key elements of any good
status:
block:observedGeneration
: the value ofmetadata.generation
that thestatus
block reflects (note thatmetadata.generation
is bumped by Kubernetes on anyspec
update).Folks observing resources would then check:
Now, the question is what's needed for
IsReady()
? Well, at a minimum the controller would need to know how many of the Envoys have received that version of the configuration and either report that information directly tostatus
or summarize it. At the end of the day it should be possible to write anIsReady()
that checks that{envoys programmed @ status.obsGen} == {number of envoys}
with reasonable confidence.The text was updated successfully, but these errors were encountered: