Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add E2E test case cover duplicatepods strategy #627

Merged
merged 1 commit into from
Oct 1, 2021
Merged

Add E2E test case cover duplicatepods strategy #627

merged 1 commit into from
Oct 1, 2021

Conversation

JaneLiuL
Copy link
Member

@JaneLiuL JaneLiuL commented Sep 13, 2021

Add duplicatepods strategy into E2E test case :)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 13, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @JaneLiuL!

It looks like this is your first PR to kubernetes-sigs/descheduler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/descheduler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 13, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @JaneLiuL. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 13, 2021
@a7i
Copy link
Contributor

a7i commented Sep 13, 2021

What are your thoughts on moving this to its own file? e2e_toomanyrestarts_test.go?

I followed this pattern for TopologySpreadConstraint strategy as a single file was making it hard to read/follow.

@JaneLiuL
Copy link
Member Author

/ok-to-test

@k8s-ci-robot
Copy link
Contributor

@JaneLiuL: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ingvagabund
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 14, 2021
Copy link
Contributor

@ingvagabund ingvagabund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test!!! I dropped some comments.

@@ -0,0 +1,185 @@
/*
Copyright 2017 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2021

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, I will fix, but actually I am really confuse that other files use 2017.. Could I fix all others files to 2021 or only let it go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the other files are a bit pre-historic. When creating new files we tend to use the current year. Actually, there's no test checking if all files have the right header so there are even some files without it. It's sufficient to just change the year in this file.

v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

"k8s.io/apimachinery/pkg/labels"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you group this import with the k8s.io prefixed ones that are above?

}
for _, tc := range tests {
t.Log("Creating RC with 4 replicas owning the created pods")
rc := RcByNameContainer("toomanyrestarts", testNamespace.Name, int32(4), map[string]string{"test": "restart-pod"}, nil, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RcByNameContainer creates a pod template without the pod.spec set. So the test must fail in any case. Also, I don't think clientSet.CoreV1().Pods(pod.Namespace).Create will create a pod with the pod.Status as provided (maybe I am wrong?). One needs to use `Pods(...).Status(...).Update subresource instead.

Can you move t.Log("Creating pods with different restartCount") part into for _, tc := range tests and have the RestartCount as part of the test struct? Then creating the pods with the RestartCount in every test iteration right before RcByNameContainer is invoked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will fix, much thanks :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to use if _, err := clientSet.CoreV1().Pods(pod.Namespace).UpdateStatus(ctx, pod, metav1.UpdateOptions{}); err != nil { to update the restartCount, it success, but very soon, about 2 seconds, the kubelet will rollback the restartCount back to 0.
So I change to only test duplicatePods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kubelet will not let himself to be fooled quite quickly. You might create pods that will constantly crash (e.g. running sleep 1s && exit 1).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good idea~ Before your comment, I just use kubectl exec -it %s -n %s -- /bin/sh -c "kill 1" for kill pod :(
I will fix soon~ Thanks a lot~

false,
)

podsOnNodeBefore, err := podutil.ListPodsOnANode(ctx, clientSet, workerNodes[0], podutil.WithFilter(podEvictor.Evictable().IsEvictable))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workerNodes[0] is too specific. In different cluster with many worker nodes the pods may be scheduled to different worker nodes than the first one. Can you update the code to list all relevant pods from all the worker nodes instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I check the E2E Test infra, it only create a Kind cluster, no other nodes join, that's why I use workerNode[0], and I also reference other test code, they use the same word.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you find a way to write the code independent of the number of worker nodes the better. Otherwise, it's not required. Just better to have.

@JaneLiuL JaneLiuL changed the title Add E2E test case cover tooManyRestarts strategy Add E2E test case cover duplicatepods strategy Sep 15, 2021
@JaneLiuL
Copy link
Member Author

/retest

beforeFunc func(deployment *appsv1.Deployment)
expectedEvictedPodCount int
}{
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to add more tests before the PR merges?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need time for enhance the toomanyRestart test case :)
Which one you prefer? One PR or another PR to cover toomanyRestart test case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still time to do both :) It will take a time before we release the descheduler again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each new test in its own PR is preferable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi , I finish the toomanyrestart test case and all success now :)

@JaneLiuL
Copy link
Member Author

@ingvagabund I already fix all the comments and also including the toomangrestart test strategy.
Would you help to review ? :) Much thanks

@ingvagabund
Copy link
Contributor

On my todo list :)

Copy link
Contributor

@ingvagabund ingvagabund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in overall :) Just a couple of nits.

}
podsAfter := len(podsOnNodeAfter)
if podsAfter-podsBefore != tc.expectedEvictedPodCount {
t.Errorf("Not unexpected pod been evicted from %v node", workerNodes[0].Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.Errorf("Not unexpected pod been evicted from %v node", workerNodes[0].Name) -> t.Errorf("Unexpected number of pods have been evicted from %v node, got %, expected %", workerNodes[0].Name, podsAfter-podsBefore, tc.expectedEvictedPodCount)

)

waitForTerminatingPodsToDisappear(ctx, t, clientSet, testNamespace.Name)
podsOnNodeAfter, err := podutil.ListPodsOnANode(ctx, clientSet, workerNodes[0], podutil.WithFilter(podEvictor.Evictable().IsEvictable))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky since the number of pods on the node after running the strategy can vary based on how fast the kubelet(s) are. Using actualEvictedPodCount := podEvictor.TotalEvicted() to get the number of evicted pods is sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. fix done~

return false, err
}
if len(podList.Items) != desireRunningPodNum {
t.Logf("Waiting for %v pods to be created, got %v instead", desireRunningPodNum, len(podList.Items))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/created/running

func waitPodRestartCount(ctx context.Context, clientSet clientset.Interface, namespace string, t *testing.T) (bool, error) {
timeout := time.After(5 * time.Minute)
tick := time.Tick(5 * time.Second)
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use wait.PollImmediate instead of for+select? wait.PollImmediate will give you the same functionality, just a bit cleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. I agree with you. At first I use wait.PollImmediate, but soon I give up.
That's for I need to execute List pods many times and util all pod restart count more than 4.
But I I use wait.PollImmediate, I have to return when I execute the first time to List Pods.
So I can use for for I need to execute list pods many times.


waitForTerminatingPodsToDisappear(ctx, t, clientSet, testNamespace.Name)
actualEvictedPodCount := podEvictor.TotalEvicted()
if actualEvictedPodCount < tc.expectedEvictedPodCount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use "!=" to report even the case when more pods than expected are evicted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. For the testing, I found that the pod will be evict when the restart count meet the requirement. And when pod be evict, deployment controller will finger it out and create the pod again, then the pod will restart, and again will meet the too many restart strategy and evict again.
So the actualEvictPodCount will more then the expectedEvictedPodCount.
That's why I use < in the test case. And I will not use "!=" :) Is it that's ok for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, thanks for the clarification.

waitForTerminatingPodsToDisappear(ctx, t, clientSet, testNamespace.Name)
actualEvictedPodCount := podEvictor.TotalEvicted()
if actualEvictedPodCount < tc.expectedEvictedPodCount {
t.Errorf("Fail to run test case %v, actual evicted pod number is: %d", tc.name, actualEvictedPodCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also mention tc.expectedEvictedPodCount in the error message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done~

@JaneLiuL
Copy link
Member Author

/retest

@ingvagabund
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 22, 2021
@JaneLiuL
Copy link
Member Author

So sorry to at you again. It seems still need /approve label could be merge :) @ingvagabund

@ingvagabund
Copy link
Contributor

I asked @damemi for the final approval :) Just in case I missed something

},
}
for _, tc := range tests {
t.Logf("Creating deployment %v in %v namespace", deploymentObj.Name, deploymentObj.Namespace)
Copy link
Contributor

@a7i a7i Sep 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to run the tests separately by utilizing:

	for _, tc := range tests {
		t.Run(tc.description, func(t *testing.T) {
		
		}
	}	

Otherwise, you won't know which test case failed unless you read every log line. same in too many restarts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix done. Would you please kindly review?

}
return
}
//defer clientSet.AppsV1().Deployments(deploymentObj.Namespace).Delete(ctx, deploymentObj.Name, metav1.DeleteOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think this should be uncommented to prevent cascading failures if one test fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix done. Would you please kindly review?

t.Errorf("Unexpected number of pods have been evicted, got %v, expected %v", actualEvictedPodCount, tc.expectedEvictedPodCount)
}

clientSet.AppsV1().Deployments(deploymentObj.Namespace).Delete(ctx, deploymentObj.Name, metav1.DeleteOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to do this up top and use defer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix done. Would you please kindly review?

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 26, 2021
@JaneLiuL
Copy link
Member Author

@ingvagabund @a7i So sorry to at you again, would you help to review and approve if no other comments? :)

@a7i
Copy link
Contributor

a7i commented Sep 28, 2021

@ingvagabund @a7i So sorry to at you again, would you help to review and approve if no other comments? :)

LGTM but I'm not a codeowner

@seanmalloy
Copy link
Member

/lgtm
/assign @damemi

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 1, 2021
@ingvagabund
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ingvagabund, JaneLiuL

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 1, 2021
@k8s-ci-robot k8s-ci-robot merged commit 5b55794 into kubernetes-sigs:master Oct 1, 2021
briend pushed a commit to briend/descheduler that referenced this pull request Feb 11, 2022
Add E2E test case cover duplicatepods strategy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants