-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3673: Promote Parallel Image Pull Limit to Beta #4036
Conversation
ruiwen-zhao
commented
May 25, 2023
- One-line PR description: Promote Parallel Image Pull Limit to Beta
- Issue link: Kubelet limit of Parallel Image Pulls #3673
- Other comments:
cc @pacoxu |
96cc146
to
646038c
Compare
646038c
to
571b307
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -597,13 +608,19 @@ rollout. Similarly, consider large clusters and how enablement/disablement | |||
will rollout across nodes. | |||
--> | |||
|
|||
This is an opt-in feature, and it does not change any default behavior. If there is any bug in this feature, image pulls might fail. | |||
No running workloads will be imapcted. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the user may bypass the limit when the node manager is constantly restarting kubelet.
The cluster/Node manager may restart the kubelet when changing the kubelet configuration to do rollout/rollback.
Not sure if we should mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I added a paragraph here mentioning that restarting kubelet might allow more image pulls than the limit
Overall LGTM |
571b307
to
c1b2d53
Compare
/assign wojtek-t for PRR approval |
@@ -345,9 +346,15 @@ This can inform certain test coverage improvements that we want to do before | |||
extending the production code to implement this enhancement. | |||
--> | |||
|
|||
New unit test will be added to image_manager_test.go. | |||
New unit test is be added to image_manager_test.go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. I was meant to say that new unit test is added already. Updated.
c1b2d53
to
9094218
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/test pull-enhancements-verify |
@ruiwen-zhao please fix CI test failures. |
9094218
to
28ef18c
Compare
28ef18c
to
3017cfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple smaller comments, but overall it LGTM
@ruiwen-zhao - please also remember to have the issue opted-in to the release |
3017cfa
to
0fa21fe
Compare
@wojtek-t thanks for the review!
It doesn't seem that I have the permission to add milestone labels to issues: #3673 (comment) Is there anything I can do to opt-in the issue? |
You need to ask SIG-lead to opt-in the feature for milestone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruiwen-zhao - just two super minor comments, but PRR already looks good now.
/approve PRR
@@ -345,8 +346,14 @@ This can inform certain test coverage improvements that we want to do before | |||
extending the production code to implement this enhancement. | |||
--> | |||
|
|||
New unit test will be added to image_manager_test.go. | |||
- `k8s.io/kubernetes/pkg/kubelet/images/puller.go`: `01/05/2023` - `100.0` | |||
New unit test is added to image_manager_test.go along with Alpha implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plesae mark the box in L315.
@@ -374,7 +381,9 @@ https://storage.googleapis.com/k8s-triage/index.html | |||
We expect no non-infra related flakes in the last month as a GA graduation criteria. | |||
--> | |||
|
|||
A new node_e2e test with `serialize-image-pulls==false` will be added to make sure that when maxParallelImagePulls is reached, all further image pulls will be blocked. | |||
A new node_e2e test with `serialize-image-pulls==false` will be added test parallel image pull limits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L368 - please just say that we don't need integration tests - e2e tests will cover all we need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @dchen1107 @mrunalp |
@mrunalp you are listed as an approver on this KEP. Please take a look |
A new node_e2e test with `serialize-image-pulls==false` will be added to make sure that when maxParallelImagePulls is reached, all further image pulls will be blocked. | ||
A new node_e2e test with `serialize-image-pulls==false` will be added test parallel image pull limits. | ||
1. When maxParallelImagePulls is reached, all further image pulls will be blocked. | ||
2. Verify the behavior when the same image is pulled in parallel, which will happen when image pull policy is `Always`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there logic to wait requests for an image pull of a pull already in progress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand the question, but my plan here is to verify the behavior based on the timestamp of image pulling events.
So for the first case, we will check if there are N pulling events started roughly at the same time.
for the second case, we will check if there are two pulling events for the same image.
- Event Reason: Pulling | ||
|
||
Assuming `MaxParallelImagePulls` is set to _X_, an operator can look at the container runtime log, and see _X_ PullImageRequests sent to container runtime at the same time. | ||
If the container images are of similar sizes, an operator can see k8s event and see _X_ images finish pulling at roughly the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this will be true if the images are being pulled off different registries or from different servers with different loads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this will only be true when the pulling takes roughly same time for all the images. If different pulls take different amount of time, then an operator will not be able to deduce the parallel behavior.
Updated the KEP to reflect this.
Signed-off-by: ruiwen-zhao <ruiwen@google.com>
0fa21fe
to
d1a39f1
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrunalp, pacoxu, ruiwen-zhao, SergeyKanzhelev, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |