-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update KEP with new name: PodReadyToStartContainers condition #3778
Conversation
ddebroy
commented
Jan 24, 2023
- One-line PR description: Update KEP with PodReadyToStartContainers condition and updating milestone for Beta
- Issue link: Pod conditions around readiness to start containers after completion of pod sandbox creation #3085
- Other comments: This KEP updates the name of the condition to PodReadyToStartContainers (from the previous PodHasNetwork that sig-network had concerns with) as previously discussed with sig-node and sig-network leads. The description of the condition and goals are updated accordingly.
Signed-off-by: Deep Debroy <ddebroy@gmail.com>
/lgtm Not sure the new name is better than the previous one though since it implies that if a pod can start the container purely depends on the network, which is not true. But I am not good at naming, go with the majority here. :-) |
/hold for sig-network lgtm as well. |
/lgtm previous name imply a lot of things for the network that may not be true or may be misleading, no strong opinion about the name, but I'll adventure to throw a suggestion: |
I like |
Latest commit updates the PRR section to Beta. No other sections already LGTM ed were altered. |
and CNI plugins). Completion of all these phases puts the pod sandbox in a | ||
state where the containers in a pod can be started. This KEP proposes a | ||
`PodReadyToStartContainers` condition in pod status to indicate a pod has | ||
reached a state where it's containers are ready to be started. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: its
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments regarding with PRR.
One more question regarding with Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
(Ask here since cannot comment on unchanged line), due to Pod startup latency SLI/SLO details, will this feature slightly add up on the latency?
/cc @johnbelamaric
keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md
Show resolved
Hide resolved
keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md
Show resolved
Hide resolved
``` | ||
apiserver_request_total{verb="PATCH", resource="pods", subresource="status"} | ||
``` | ||
for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increase of this metrics could potentially be caused by other issue besides current feature. Is there any other info cluster admin could use to inform a rollback due to PodReadyToStartContainersCondition
issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't any specific metric right now for distinct Pod Status updates (specific pod conditions or other aspects of pod status like container statuses). We may consider that as an independent enhancement to the Kubelet status manager (covering all pod status fields) as mentioned below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if there is a node label on those metrics? If the spike is specifically from nodes with this feature enabled, that could be why. I believe @logicalhan recently added feature gate metrics, so in theory you could construct a query that looked for spikes only on nodes with the feature enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no originating node label on the API Server request metrics here: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L80
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
This may result in more API calls from Kubelet Status Manager to API server (under certain circumstances when the Kubelet Status Manager did not batch a bunch of updates for some reason). This aspect is covered under the section No specific SLIs or metrics are surfaced by Kubelet associated with distinct pod status updates. So that can be considered a separate enhancement to broadly address the overall Kubelet Status manager. |
``` | ||
apiserver_request_total{verb="PATCH", resource="pods", subresource="status"} | ||
``` | ||
for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if there is a node label on those metrics? If the spike is specifically from nodes with this feature enabled, that could be why. I believe @logicalhan recently added feature gate metrics, so in theory you could construct a query that looked for spikes only on nodes with the feature enabled.
keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/3085-pod-conditions-for-starting-completition-of-sandbox-creation/README.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Deep Debroy <ddebroy@gmail.com>
@johnbelamaric @cici37 the PRR section has been updated with the suggestions above. Please note that the update (scoped to the last commit) also removed the original LGTM from sig-node reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
``` | ||
apiserver_request_total{verb="PATCH", resource="pods", subresource="status"} | ||
``` | ||
for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, ddebroy, johnbelamaric The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |