KEP-753: add PRR answers for beta

Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
kubernetes · Oct 1, 2023 · 11c5218 · 11c5218
1 parent 8669330
commit 11c5218
Showing 1 changed file with 43 additions and 0 deletions.
diff --git a/keps/sig-node/753-sidecar-containers/README.md b/keps/sig-node/753-sidecar-containers/README.md
@@ -1472,13 +1472,28 @@ rollout. Similarly, consider large clusters and how enablement/disablement
 will rollout across nodes.
 -->
 
+Rollout cannot fail since we do not break compatibility with Pods without sidecars.
+
+Rollback can fail if a Pod with sidecars is (re)scheduled on a node where the feature
+is disabled.
+
+Running workloads are not impacted.
+
 ###### What specific metrics should inform a rollback?
 
 <!--
 What signals should users be paying attention to when the feature is young
 that might indicate a serious problem?
 -->
 
+Pods that don't feature sidecars are not affected by the KEP.
+
+Pods with sidecars might take a long time to exit and exceed the TGPS, a new
+event should be added in beta to help administrators diagnose this issue.
+Rather than rolling back the feature, they should work on the graceful termination
+of their main containers to ensure sidecars have enough time to be notified
+and exit on their own.
+
 ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
 
 <!--
@@ -1487,12 +1502,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
 are missing a bunch of machinery and tooling and can't do that now.
 -->
 
+TBD
+
 ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
 
 <!--
 Even if applying deprecation policies, they may still surprise some users.
 -->
 
+No.
+
 ### Monitoring Requirements
 
 <!--
@@ -1589,6 +1608,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
       - Impact of its degraded performance or high-error rates on the feature:
 -->
 
+No.
+
 ### Scalability
 
 <!--
@@ -1616,6 +1637,8 @@ Focusing mostly on:
     heartbeats, leader election, etc.)
 -->
 
+No.
+
 ###### Will enabling / using this feature result in introducing new API types?
 
 <!--
@@ -1625,6 +1648,8 @@ Describe them, providing:
   - Supported number of objects per namespace (for namespace-scoped objects)
 -->
 
+No.
+
 ###### Will enabling / using this feature result in any new calls to the cloud provider?
 
 <!--
@@ -1633,6 +1658,8 @@ Describe them, providing:
   - Estimated increase:
 -->
 
+No.
+
 ###### Will enabling / using this feature result in increasing size or count of the existing API objects?
 
 <!--
@@ -1642,6 +1669,8 @@ Describe them, providing:
   - Estimated amount of new objects: (e.g., new Object X for every existing Pod)
 -->
 
+No.
+
 ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
 
 <!--
@@ -1653,6 +1682,10 @@ Think about adding additional work or introducing new steps in between
 [existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
 -->
 
+Graceful Pod termination might take longer with sidecars since their exit sequence starts after the
+last main container has stopped.
+The impact should be negligible because the TGPS is enforced in all cases.
+
 ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
 
 <!--
@@ -1665,6 +1698,8 @@ This through this both in small and large cases, again with respect to the
 [supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
 -->
 
+No.
+
 ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
 
 <!--
@@ -1677,6 +1712,8 @@ Are there any tests that were run/should be run to understand performance charac
 and validate the declared limits?
 -->
 
+No.
+
 ### Troubleshooting
 
 <!--
@@ -1692,6 +1729,8 @@ details). For now, we leave it here.
 
 ###### How does this feature react if the API server and/or etcd is unavailable?
 
+Nothing changes compared to the current kubelet behavior.
+
 ###### What are other known failure modes?
 
 <!--
@@ -1707,8 +1746,12 @@ For each of them, fill in the following information by copying the below templat
     - Testing: Are there any tests for failure mode? If not, describe why.
 -->
 
+None.
+
 ###### What steps should be taken if SLOs are not being met to determine the problem?
 
+None.
+
 ## Implementation History
 
 <!--