Skip to content

Commit

Permalink
KEP-753: add PRR answers for beta
Browse files Browse the repository at this point in the history
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
  • Loading branch information
matthyx committed Oct 1, 2023
1 parent 8669330 commit 11c5218
Showing 1 changed file with 43 additions and 0 deletions.
43 changes: 43 additions & 0 deletions keps/sig-node/753-sidecar-containers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1472,13 +1472,28 @@ rollout. Similarly, consider large clusters and how enablement/disablement
will rollout across nodes.
-->

Rollout cannot fail since we do not break compatibility with Pods without sidecars.

Rollback can fail if a Pod with sidecars is (re)scheduled on a node where the feature
is disabled.

Running workloads are not impacted.

###### What specific metrics should inform a rollback?

<!--
What signals should users be paying attention to when the feature is young
that might indicate a serious problem?
-->

Pods that don't feature sidecars are not affected by the KEP.

Pods with sidecars might take a long time to exit and exceed the TGPS, a new
event should be added in beta to help administrators diagnose this issue.
Rather than rolling back the feature, they should work on the graceful termination
of their main containers to ensure sidecars have enough time to be notified
and exit on their own.

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

<!--
Expand All @@ -1487,12 +1502,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->

TBD

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

<!--
Even if applying deprecation policies, they may still surprise some users.
-->

No.

### Monitoring Requirements

<!--
Expand Down Expand Up @@ -1589,6 +1608,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
- Impact of its degraded performance or high-error rates on the feature:
-->

No.

### Scalability

<!--
Expand Down Expand Up @@ -1616,6 +1637,8 @@ Focusing mostly on:
heartbeats, leader election, etc.)
-->

No.

###### Will enabling / using this feature result in introducing new API types?

<!--
Expand All @@ -1625,6 +1648,8 @@ Describe them, providing:
- Supported number of objects per namespace (for namespace-scoped objects)
-->

No.

###### Will enabling / using this feature result in any new calls to the cloud provider?

<!--
Expand All @@ -1633,6 +1658,8 @@ Describe them, providing:
- Estimated increase:
-->

No.

###### Will enabling / using this feature result in increasing size or count of the existing API objects?

<!--
Expand All @@ -1642,6 +1669,8 @@ Describe them, providing:
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
-->

No.

###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

<!--
Expand All @@ -1653,6 +1682,10 @@ Think about adding additional work or introducing new steps in between
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
-->

Graceful Pod termination might take longer with sidecars since their exit sequence starts after the
last main container has stopped.
The impact should be negligible because the TGPS is enforced in all cases.

###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

<!--
Expand All @@ -1665,6 +1698,8 @@ This through this both in small and large cases, again with respect to the
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
-->

No.

###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

<!--
Expand All @@ -1677,6 +1712,8 @@ Are there any tests that were run/should be run to understand performance charac
and validate the declared limits?
-->

No.

### Troubleshooting

<!--
Expand All @@ -1692,6 +1729,8 @@ details). For now, we leave it here.

###### How does this feature react if the API server and/or etcd is unavailable?

Nothing changes compared to the current kubelet behavior.

###### What are other known failure modes?

<!--
Expand All @@ -1707,8 +1746,12 @@ For each of them, fill in the following information by copying the below templat
- Testing: Are there any tests for failure mode? If not, describe why.
-->

None.

###### What steps should be taken if SLOs are not being met to determine the problem?

None.

## Implementation History

<!--
Expand Down

0 comments on commit 11c5218

Please sign in to comment.