-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods which have not "started" can not be "ready" #92196
Pods which have not "started" can not be "ready" #92196
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks Tim. Let me have a look later today... |
/retest |
Tests are fine, which show the poor coverage of these edge cases... |
|
Once merged we should look at the serial e2e test results as there is some coverage there. |
@@ -261,6 +249,20 @@ func (m *manager) UpdatePodStatus(podUID types.UID, podStatus *v1.PodStatus) { | |||
started = !exists | |||
} | |||
podStatus.ContainerStatuses[i].Started = &started | |||
|
|||
if started { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no way container can be ready, but not started, perhaps the status can be stored in a single variable instead of two independent flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there are more ramifications... I think it's clearer to keep both even just for the sake of explanation and ease of testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean in the API or in this function?
The API has shipped, we don't want to change that.
This commit was designed to be surgical - move the code block, add an if
. Further cleanup may be possible, but it seems low-value to me - this code is pretty simple to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant API =). I commented before looked into it and understand it's seems to be hard to change. The only benefit is guaranteed consistency of the status. This method is ok. But there are more. For instance, this:
containerStatus.Ready = ready |
Ready
and Started
are set seemingly independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That case (I think) is after readiness has been considered at a lower level, but I admit I am not 100% confident in this code area any more :(
/retest verify failed after 2 hours and a LOT of logs |
Before this commit, containers which have both a `startupProbe` and a `readinessProbe` are marked as `ready=false` during stratup, but containers which have only a `startupProbe` are marked `ready=true`. This doesn't make sense. This commit only considers readiness if the container is considered to have "started", which leaves `ready=false` while starting up.
New changes are detected. LGTM label has been removed. |
@thockin what's the difference between your initial commit and the force-pushed one? I don't see it... |
@SergeyKanzhelev there are 2 states, ready and started, and they represent 2 different notions that are monitored by different probes... started is a permanent state (once startupProbe succeeds it never changes) whereas ready depends on the result of the readinessProbe. Please refer to the KEP (kubernetes/enhancements#950) or my lightning talk on the subject: https://youtu.be/wO1uy9QKNHQ |
Totally with you on this. My comment was that there are three acceptable states represented with four possible combinations of flags. Which led to this Ready, but not Started issue. And there are seems to be other places which set these flags independently without ensuring that container is not ending up in un-acceptable state. I'm pretty sure the Also, great lightning talk. |
Good point, should we create another issue, or put everything on #89995 - in that case this PR should not "fix" it alone. |
Recently, I noticed that pods without a readiness probe continue to receive traffic while they are in terminating status. In this regard, I propose to add new state for startUp probe and make liveness and readiness probs work only for Why is it so important to solve this problem? Example behaviour when pod got kill signal but have ready state
Thanks for your attention! |
This has always been the case... when you don't specify a readinessProbe, you assume an always ready state.
This is clearly a different use case than what the startupProbe was designed to solve. If you feel strong enough you can take the point and start a KEP to modify the handling of the shutdown phase... However, now that we have the startupProbe, people could almost do without a readinessProbe (if they don't care about removing/adding back the pod to the load balancer pool) if that case had been covered. Then it could be a feature (or follow up) of my KEP... What do you think @thockin (from a philosophical point of view) ? |
Yes, I propose to change this behaviour. A pod should be ready only in running state and readiness probe should manage ready state also only for running state. In startUp and terminating state pod should be not ready. I think this is bad behaviour when requests can go to pods wich starting work and completing work. How do you think this is a good propose? This propose can solve many problems like this. |
According to this page, it should not be the case: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods Even if the pod is still "ready" no new traffic should go to it... so if it's something you can reproduce I suggest you fill a bug report (which is something I can help you with). |
What mean "ready" for pods in termination state? |
I need to check in the kubelet code... |
/test pull-kubernetes-kubemark-e2e-gce-big |
It was just a rebase - trying to unstick tests |
/hold Temporary hold to get prow/tide to get back on its feet. Feel free to remove hold in a few hours. |
/hold cancel |
…6-upstream-release-1.18 Automated cherry pick of #92196: Pods which have not "started" can not be "ready"
Before this commit, containers which have both a
startupProbe
and areadinessProbe
are marked asready=false
during stratup, butcontainers which have only a
startupProbe
are markedready=true
.This doesn't make sense.
This commit only considers readiness if the container is considered to
have "started", which leaves
ready=false
while starting up./kind bug
Fixes #89995
Special notes for your reviewer:
I am NOT super familiar with this code area. I dug around to find this and empirically it seems to work.
Does this PR introduce a user-facing change?: