-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Ensure readiness probe relies on translated config #10542
base: main
Are you sure you want to change the base?
Conversation
Issues linked to changelog: |
1 similar comment
Issues linked to changelog: |
Visit the preview URL for this PR (updated for commit b074b14): https://gloo-edge--pr10542-fix-readiness-probe-omzmzbzp.web.app (expires Thu, 09 Jan 2025 09:03:12 GMT) 🔥 via Firebase Hosting GitHub Action 🌎 Sign: 77c2b86e287749579b7ff9cadb81e099042ef677 |
@@ -1,4 +1,4 @@ | |||
package httplisteneroptions | |||
package httplisteneroptions_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes made to avoid import cycles
@@ -91,6 +91,8 @@ func (t *translator) TranslateProxy( | |||
reporter, | |||
) | |||
|
|||
listeners = listener.AppendHealthCheckListener(listeners) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, previously, the readiness listener is gated by $gateway.readinessProbe
helm option, now that's not possible anymore? If that's the case, should any documentation be updated?
Also, the envoy admin port /ready
route is also removed from the static config, is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helm option still generates the probe, however the HTTP endpoint is configured via translation and not the static bootstrap config
This is an internal change and user should not notice any difference. The ready route was removed since it was not used anywhere nor mentioned in the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be sure no customer is using the envoy admin /ready
endpoint directly? If you already discussed this with Sam or Nathan, I am ok with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Customers use the customhealthcheck with /ready if thats the question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, overall looks good.
} | ||
|
||
// GenerateHealthCheckListener returns a readiness listener with the health check plugin defined on it | ||
func GenerateHealthCheckListener() *v1.Listener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a way of validating that this generates the same config as the removed Helm? Or are the passing tests enough validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The passing tests should do. Additionally the changes to the test files (https://github.com/solo-io/gloo/pull/10542/files#diff-1b1edb892a960bc5d498be228066a8c5eaa997b92dfb62a9c297e03c6b0fe07e) show the envoy listener generated
Description
When the readiness listener is defined in the bootstrap config, the pod can be considered ready before it receives any config over XDS that can lead to downtime during an upgrade. This fixes the issue by defining the readiness listener during translation, ensuring that the envoy pod will be ready only when it receives the XDS config
API changes
N/A
Context
This fixes the flaky Zero Downtime tests
Testing steps
Checklist: