Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3673: Promote Parallel Image Pull Limit to Beta #4036

Merged
merged 1 commit into from
Jun 10, 2023

Conversation

ruiwen-zhao
Copy link
Contributor

  • One-line PR description: Promote Parallel Image Pull Limit to Beta
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 25, 2023
@ruiwen-zhao
Copy link
Contributor Author

cc @pacoxu

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 25, 2023
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 25, 2023
@@ -597,13 +608,19 @@ rollout. Similarly, consider large clusters and how enablement/disablement
will rollout across nodes.
-->

This is an opt-in feature, and it does not change any default behavior. If there is any bug in this feature, image pulls might fail.
No running workloads will be imapcted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the user may bypass the limit when the node manager is constantly restarting kubelet.

The cluster/Node manager may restart the kubelet when changing the kubelet configuration to do rollout/rollback.

Not sure if we should mention it.

Copy link
Contributor Author

@ruiwen-zhao ruiwen-zhao May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added a paragraph here mentioning that restarting kubelet might allow more image pulls than the limit

@pacoxu
Copy link
Member

pacoxu commented May 26, 2023

Overall LGTM
/approve

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 26, 2023
@ruiwen-zhao
Copy link
Contributor Author

/assign wojtek-t

for PRR approval

@@ -345,9 +346,15 @@ This can inform certain test coverage improvements that we want to do before
extending the production code to implement this enhancement.
-->

New unit test will be added to image_manager_test.go.
New unit test is be added to image_manager_test.go.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. I was meant to say that new unit test is added already. Updated.

Copy link
Member

@pacoxu pacoxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2023
@pacoxu
Copy link
Member

pacoxu commented May 30, 2023

/test pull-enhancements-verify
/test pull-enhancements-test

@bart0sh
Copy link
Contributor

bart0sh commented May 30, 2023

@ruiwen-zhao please fix CI test failures.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2023
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple smaller comments, but overall it LGTM

@wojtek-t
Copy link
Member

wojtek-t commented Jun 1, 2023

@ruiwen-zhao - please also remember to have the issue opted-in to the release

@ruiwen-zhao
Copy link
Contributor Author

@wojtek-t thanks for the review!

@ruiwen-zhao - please also remember to have the issue opted-in to the release

It doesn't seem that I have the permission to add milestone labels to issues: #3673 (comment) Is there anything I can do to opt-in the issue?

@wojtek-t
Copy link
Member

wojtek-t commented Jun 1, 2023

It doesn't seem that I have the permission to add milestone labels to issues: #3673 (comment) Is there anything I can do to opt-in the issue?

You need to ask SIG-lead to opt-in the feature for milestone.
@SergeyKanzhelev - can you help with it?

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruiwen-zhao - just two super minor comments, but PRR already looks good now.

/approve PRR

@@ -345,8 +346,14 @@ This can inform certain test coverage improvements that we want to do before
extending the production code to implement this enhancement.
-->

New unit test will be added to image_manager_test.go.
- `k8s.io/kubernetes/pkg/kubelet/images/puller.go`: `01/05/2023` - `100.0`
New unit test is added to image_manager_test.go along with Alpha implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plesae mark the box in L315.

@@ -374,7 +381,9 @@ https://storage.googleapis.com/k8s-triage/index.html
We expect no non-infra related flakes in the last month as a GA graduation criteria.
-->

A new node_e2e test with `serialize-image-pulls==false` will be added to make sure that when maxParallelImagePulls is reached, all further image pulls will be blocked.
A new node_e2e test with `serialize-image-pulls==false` will be added test parallel image pull limits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L368 - please just say that we don't need integration tests - e2e tests will cover all we need

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2023
@pacoxu
Copy link
Member

pacoxu commented Jun 3, 2023

/assign @dchen1107 @mrunalp

@SergeyKanzhelev
Copy link
Member

@mrunalp you are listed as an approver on this KEP. Please take a look

A new node_e2e test with `serialize-image-pulls==false` will be added to make sure that when maxParallelImagePulls is reached, all further image pulls will be blocked.
A new node_e2e test with `serialize-image-pulls==false` will be added test parallel image pull limits.
1. When maxParallelImagePulls is reached, all further image pulls will be blocked.
2. Verify the behavior when the same image is pulled in parallel, which will happen when image pull policy is `Always`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there logic to wait requests for an image pull of a pull already in progress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand the question, but my plan here is to verify the behavior based on the timestamp of image pulling events.

So for the first case, we will check if there are N pulling events started roughly at the same time.

for the second case, we will check if there are two pulling events for the same image.

- Event Reason: Pulling

Assuming `MaxParallelImagePulls` is set to _X_, an operator can look at the container runtime log, and see _X_ PullImageRequests sent to container runtime at the same time.
If the container images are of similar sizes, an operator can see k8s event and see _X_ images finish pulling at roughly the same time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this will be true if the images are being pulled off different registries or from different servers with different loads.

Copy link
Contributor Author

@ruiwen-zhao ruiwen-zhao Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this will only be true when the pulling takes roughly same time for all the images. If different pulls take different amount of time, then an operator will not be able to deduce the parallel behavior.

Updated the KEP to reflect this.

Signed-off-by: ruiwen-zhao <ruiwen@google.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2023
@mrunalp
Copy link
Contributor

mrunalp commented Jun 10, 2023

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrunalp, pacoxu, ruiwen-zhao, SergeyKanzhelev, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 10, 2023
@k8s-ci-robot k8s-ci-robot merged commit 0295de2 into kubernetes:master Jun 10, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jun 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

8 participants