-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add E2E test case cover duplicatepods strategy #627
Conversation
Welcome @JaneLiuL! |
Hi @JaneLiuL. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What are your thoughts on moving this to its own file? I followed this pattern for TopologySpreadConstraint strategy as a single file was making it hard to read/follow. |
/ok-to-test |
@JaneLiuL: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test!!! I dropped some comments.
test/e2e/e2e_toomanyrestarts_test.go
Outdated
@@ -0,0 +1,185 @@ | |||
/* | |||
Copyright 2017 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I will fix, but actually I am really confuse that other files use 2017.. Could I fix all others files to 2021 or only let it go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the other files are a bit pre-historic. When creating new files we tend to use the current year. Actually, there's no test checking if all files have the right header so there are even some files without it. It's sufficient to just change the year in this file.
v1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
|
||
"k8s.io/apimachinery/pkg/labels" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you group this import with the k8s.io
prefixed ones that are above?
test/e2e/e2e_toomanyrestarts_test.go
Outdated
} | ||
for _, tc := range tests { | ||
t.Log("Creating RC with 4 replicas owning the created pods") | ||
rc := RcByNameContainer("toomanyrestarts", testNamespace.Name, int32(4), map[string]string{"test": "restart-pod"}, nil, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RcByNameContainer
creates a pod template without the pod.spec set. So the test must fail in any case. Also, I don't think clientSet.CoreV1().Pods(pod.Namespace).Create
will create a pod with the pod.Status as provided (maybe I am wrong?). One needs to use `Pods(...).Status(...).Update subresource instead.
Can you move t.Log("Creating pods with different restartCount")
part into for _, tc := range tests
and have the RestartCount
as part of the test struct? Then creating the pods with the RestartCount
in every test iteration right before RcByNameContainer
is invoked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will fix, much thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to use if _, err := clientSet.CoreV1().Pods(pod.Namespace).UpdateStatus(ctx, pod, metav1.UpdateOptions{}); err != nil {
to update the restartCount, it success, but very soon, about 2 seconds, the kubelet will rollback the restartCount back to 0.
So I change to only test duplicatePods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kubelet will not let himself to be fooled quite quickly. You might create pods that will constantly crash (e.g. running sleep 1s && exit 1
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good idea~ Before your comment, I just use kubectl exec -it %s -n %s -- /bin/sh -c "kill 1"
for kill pod :(
I will fix soon~ Thanks a lot~
test/e2e/e2e_toomanyrestarts_test.go
Outdated
false, | ||
) | ||
|
||
podsOnNodeBefore, err := podutil.ListPodsOnANode(ctx, clientSet, workerNodes[0], podutil.WithFilter(podEvictor.Evictable().IsEvictable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workerNodes[0]
is too specific. In different cluster with many worker nodes the pods may be scheduled to different worker nodes than the first one. Can you update the code to list all relevant pods from all the worker nodes instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I check the E2E Test infra, it only create a Kind cluster, no other nodes join, that's why I use workerNode[0], and I also reference other test code, they use the same word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you find a way to write the code independent of the number of worker nodes the better. Otherwise, it's not required. Just better to have.
/retest |
beforeFunc func(deployment *appsv1.Deployment) | ||
expectedEvictedPodCount int | ||
}{ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to add more tests before the PR merges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need time for enhance the toomanyRestart test case :)
Which one you prefer? One PR or another PR to cover toomanyRestart test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still time to do both :) It will take a time before we release the descheduler again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each new test in its own PR is preferable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi , I finish the toomanyrestart test case and all success now :)
@ingvagabund I already fix all the comments and also including the toomangrestart test strategy. |
On my todo list :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in overall :) Just a couple of nits.
test/e2e/e2e_duplicatepods_test.go
Outdated
} | ||
podsAfter := len(podsOnNodeAfter) | ||
if podsAfter-podsBefore != tc.expectedEvictedPodCount { | ||
t.Errorf("Not unexpected pod been evicted from %v node", workerNodes[0].Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Errorf("Not unexpected pod been evicted from %v node", workerNodes[0].Name)
-> t.Errorf("Unexpected number of pods have been evicted from %v node, got %, expected %", workerNodes[0].Name, podsAfter-podsBefore, tc.expectedEvictedPodCount)
test/e2e/e2e_duplicatepods_test.go
Outdated
) | ||
|
||
waitForTerminatingPodsToDisappear(ctx, t, clientSet, testNamespace.Name) | ||
podsOnNodeAfter, err := podutil.ListPodsOnANode(ctx, clientSet, workerNodes[0], podutil.WithFilter(podEvictor.Evictable().IsEvictable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky since the number of pods on the node after running the strategy can vary based on how fast the kubelet(s) are. Using actualEvictedPodCount := podEvictor.TotalEvicted()
to get the number of evicted pods is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. fix done~
test/e2e/e2e_duplicatepods_test.go
Outdated
return false, err | ||
} | ||
if len(podList.Items) != desireRunningPodNum { | ||
t.Logf("Waiting for %v pods to be created, got %v instead", desireRunningPodNum, len(podList.Items)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/created/running
func waitPodRestartCount(ctx context.Context, clientSet clientset.Interface, namespace string, t *testing.T) (bool, error) { | ||
timeout := time.After(5 * time.Minute) | ||
tick := time.Tick(5 * time.Second) | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use wait.PollImmediate
instead of for+select? wait.PollImmediate
will give you the same functionality, just a bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. I agree with you. At first I use wait.PollImmediate
, but soon I give up.
That's for I need to execute List pods many times and util all pod restart count more than 4.
But I I use wait.PollImmediate
, I have to return when I execute the first time to List Pods.
So I can use for
for I need to execute list pods many times.
test/e2e/e2e_toomanyrestarts_test.go
Outdated
|
||
waitForTerminatingPodsToDisappear(ctx, t, clientSet, testNamespace.Name) | ||
actualEvictedPodCount := podEvictor.TotalEvicted() | ||
if actualEvictedPodCount < tc.expectedEvictedPodCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use "!=" to report even the case when more pods than expected are evicted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. For the testing, I found that the pod will be evict when the restart count meet the requirement. And when pod be evict, deployment controller will finger it out and create the pod again, then the pod will restart, and again will meet the too many restart strategy and evict again.
So the actualEvictPodCount will more then the expectedEvictedPodCount.
That's why I use < in the test case. And I will not use "!=" :) Is it that's ok for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable, thanks for the clarification.
test/e2e/e2e_toomanyrestarts_test.go
Outdated
waitForTerminatingPodsToDisappear(ctx, t, clientSet, testNamespace.Name) | ||
actualEvictedPodCount := podEvictor.TotalEvicted() | ||
if actualEvictedPodCount < tc.expectedEvictedPodCount { | ||
t.Errorf("Fail to run test case %v, actual evicted pod number is: %d", tc.name, actualEvictedPodCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also mention tc.expectedEvictedPodCount
in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix done~
/retest |
/lgtm |
So sorry to at you again. It seems still need /approve label could be merge :) @ingvagabund |
I asked @damemi for the final approval :) Just in case I missed something |
test/e2e/e2e_duplicatepods_test.go
Outdated
}, | ||
} | ||
for _, tc := range tests { | ||
t.Logf("Creating deployment %v in %v namespace", deploymentObj.Name, deploymentObj.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to run the tests separately by utilizing:
for _, tc := range tests {
t.Run(tc.description, func(t *testing.T) {
}
}
Otherwise, you won't know which test case failed unless you read every log line. same in too many restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix done. Would you please kindly review?
test/e2e/e2e_duplicatepods_test.go
Outdated
} | ||
return | ||
} | ||
//defer clientSet.AppsV1().Deployments(deploymentObj.Namespace).Delete(ctx, deploymentObj.Name, metav1.DeleteOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think this should be uncommented to prevent cascading failures if one test fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix done. Would you please kindly review?
test/e2e/e2e_duplicatepods_test.go
Outdated
t.Errorf("Unexpected number of pods have been evicted, got %v, expected %v", actualEvictedPodCount, tc.expectedEvictedPodCount) | ||
} | ||
|
||
clientSet.AppsV1().Deployments(deploymentObj.Namespace).Delete(ctx, deploymentObj.Name, metav1.DeleteOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to do this up top and use defer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix done. Would you please kindly review?
@ingvagabund @a7i So sorry to at you again, would you help to review and approve if no other comments? :) |
LGTM but I'm not a codeowner |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ingvagabund, JaneLiuL The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add E2E test case cover duplicatepods strategy
Add duplicatepods strategy into E2E test case :)