Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples/k8s-health-check folder README advises against readiness probes for (to me) incomprehensible reasons #2292

Open
sh-at-cs opened this issue Aug 27, 2024 · 2 comments
Assignees
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: cleanup An internal cleanup or hygiene concern. type: docs Improvement to the documentation for an API.

Comments

@sh-at-cs
Copy link

sh-at-cs commented Aug 27, 2024

Description

The examples/k8s-health-check folder's README.md says:

For most common usage, adding a readiness healthcheck to the proxy sidecar container is unnecessary. An improperly configured readiness check can degrade the application's availability. [...] Most applications are resilient to transient database connection failures, and do not need to be restarted. [...] You should use the proxy container's readiness probe when these circumstances should cause k8s to terminate the entire pod: [...]

That makes it sound as if a failing readiness probe would cause the Pod to be restarted and we shouldn't use them for the cloud-sql-proxy sidecar for that reason, but this doesn't seem to be the case (?)

Instead, if I understood k8s's docs correctly, a failing readiness probe will not cause a Pod to be restarted, and readiness probes are only important for two things:

  1. Directing traffic to a Pod - only ready Pods will receive traffic.
  2. Determining whether a Pod in a Deployment is considered "available" in the context of the RollingUpdate strategy.

And for (2), it seems to me like having an accurate readiness probe is in fact essential, because otherwise the rolling update will consider pods "available" that are really broken, which then causes k8s to stop healthy but outdated Pods to make room for them.

So I don't understand this piece of advice.

Does it just confuse the readiness probe with the liveness probe (maybe because that was introduced later? But then again, the document does mention the liveness probe, too...)?

Potential Solution

Fix the README so it accurately reflects what a readiness probe is for and how it's relevant, or, if it turns out I'm wrong, link to the paragraphs in the k8s docs where my misconceptions are corrected from README.md so other users don't develop the same ones in the future.

Additional Details

There are similar comments in the examples/k8s-health-check/proxy_with_http_health_check.yaml file, which should be corrected as well if it turns out that there is something to correct here.

@sh-at-cs sh-at-cs added the type: docs Improvement to the documentation for an API. label Aug 27, 2024
@GoogleCloudPlatform GoogleCloudPlatform deleted a comment Aug 27, 2024
@jackwotherspoon
Copy link
Collaborator

Hi @sh-at-cs thanks for raising an issue on the Cloud SQL Proxy!

I'll let @hessjcg respond to this as he is our kubernetes expert and should be able to provide guidance here 😄

@sh-at-cs sh-at-cs changed the title examples folder README advises against readiness probes for (to me) incomprehensible reasons examples/k8s-health-check folder README advises against readiness probes for (to me) incomprehensible reasons Aug 29, 2024
@hessjcg
Copy link
Collaborator

hessjcg commented Aug 30, 2024

@sh-at-cs, You raise a valid point. The k8s readiness probe controls how traffic is routed, not whether the pod is restarted. I will update the README to make this clear.

This advice is based on our practical experience. Our users generally get better availability when they configure readiness probes only on the main application container. Having a readiness probe on the main container then makes redundant the readiness probe on the to the proxy sidecar.

We should also update the examples to demonstrate how to make the proxy container start up to a ready state before the main application container starts. (The implementation is discussed in more detail in GoogleCloudPlatform/cloud-sql-proxy-operator#496 and GoogleCloudPlatform/cloud-sql-proxy-operator#381)

@hessjcg hessjcg closed this as completed Dec 10, 2024
@hessjcg hessjcg reopened this Dec 11, 2024
@hessjcg hessjcg added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Dec 11, 2024
@jackwotherspoon jackwotherspoon added the type: cleanup An internal cleanup or hygiene concern. label Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Moderately-important priority. Fix may not be included in next release. type: cleanup An internal cleanup or hygiene concern. type: docs Improvement to the documentation for an API.
Projects
None yet
Development

No branches or pull requests

3 participants