Skip to content

Commit

Permalink
Promote Parallel Image Pull Limit to Beta
Browse files Browse the repository at this point in the history
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
  • Loading branch information
ruiwen-zhao committed May 25, 2023
1 parent 8f39472 commit 96cc146
Showing 1 changed file with 40 additions and 10 deletions.
50 changes: 40 additions & 10 deletions keps/sig-node/3673-kubelet-parallel-image-pull-limit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,7 @@ A new node_e2e test with `serialize-image-pulls==false` will be added to make s

#### Beta
- Gather feedback from developers and surveys
- Add e2e test to cover the parallel image pull case

#### GA
- Gather feedback from real-world usage from kubernetes vendors.
Expand Down Expand Up @@ -585,6 +586,7 @@ https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05
This section must be completed when targeting beta to a release.
-->


###### How can a rollout or rollback fail? Can it impact already running workloads?

<!--
Expand All @@ -597,13 +599,19 @@ rollout. Similarly, consider large clusters and how enablement/disablement
will rollout across nodes.
-->

This is an opt-in feature, and it does not change any default behavior. If there is any bug in this feature, image pulls might fail.
No running workloads will be imapcted.

###### What specific metrics should inform a rollback?

<!--
What signals should users be paying attention to when the feature is young
that might indicate a serious problem?
-->

In worst case, image pulls might fail. Users can monitor image pull k8s events and `runtime_operations_errors_total` metric to see if there is an increase
of image pull failures.

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

<!--
Expand All @@ -612,12 +620,18 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->

This is an opt-in feature, and it does not change any default behavior. We manually tested enabling and disabling this feature by changing kubelet config and
restarting kubelet.


###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

<!--
Even if applying deprecation policies, they may still surprise some users.
-->

No.

### Monitoring Requirements

<!--
Expand All @@ -635,6 +649,8 @@ checking if there are objects with field X set) may be a last resort. Avoid
logs or events for this purpose.
-->

See the section below.

###### How can someone using this feature know that it is working for their instance?

<!--
Expand All @@ -646,13 +662,10 @@ and operation of this feature.
Recall that end users cannot usually observe component logs or access metrics.
-->

- [ ] Events
- Event Reason:
- [ ] API .status
- Condition name:
- Other field:
- [ ] Other (treat as last resort)
- Details:
- [X] Events
- Event Reason: Pulling

This feature sets a limit on the parallel image pulls, so an operator can look at image pull k8s events, and see the number of images pulled at the same time.

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Expand All @@ -677,15 +690,19 @@ question.
Pick one more of these and delete the rest.
-->

- [ ] Metrics
- Metric name:
- [Optional] Aggregation method:
We can rely on the existing metrics on image pull to determine if this feature has any impact on image pulling.

- [X] Metrics
- Metric name: runtime_operations_errors_total
- [Optional] Aggregation method: operation_type=pull_image
- Components exposing the metric:
- [ ] Other (treat as last resort)
- Details:

###### Are there any missing metrics that would be useful to have to improve observability of this feature?

No.

<!--
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
implementation difficulties, etc.).
Expand All @@ -699,6 +716,8 @@ This section must be completed when targeting beta to a release.

###### Does this feature depend on any specific services running in the cluster?

No.

<!--
Think about both cluster-level services (e.g. metrics-server) as well
as node-level agents (e.g. specific version of CRI). Focus on external or
Expand Down Expand Up @@ -817,8 +836,12 @@ details). For now, we leave it here.

###### How does this feature react if the API server and/or etcd is unavailable?

N/A. This feature does not rely on any component other than kubelet.

###### What are other known failure modes?

No known failure modes.

<!--
For each of them, fill in the following information by copying the below template:
- [Failure mode brief description]
Expand All @@ -834,6 +857,9 @@ For each of them, fill in the following information by copying the below templat

###### What steps should be taken if SLOs are not being met to determine the problem?

If this feature impact image pulling. The user should unset MaxParallelImagePulls (i.e. setting MaxParallelImagePulls to nil),
or set SerialImagePulls to true to enable serial image pulling.

## Implementation History

<!--
Expand All @@ -847,6 +873,10 @@ Major milestones might include:
- when the KEP was retired or superseded
-->

### Alpha

Alpha feature was implemented in 1.27.

## Drawbacks

<!--
Expand Down

0 comments on commit 96cc146

Please sign in to comment.