-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to handle hash collisions correctly for DaemonSets #66476
Fix to handle hash collisions correctly for DaemonSets #66476
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
CLA should be fixed now |
/ok-to-test |
done, err := Match(ds, existedHistory) | ||
if err != nil { | ||
return nil, err | ||
done, matchErr := Match(ds, existedHistory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is bound to happen again. Can you do something like
if outerErr := err; errors.IsAlreadExists(outerErr) {
...
return outerErr
}
t.Fatalf("Failed to update DaemonSet: %v", err) | ||
} | ||
|
||
validateDaemonSetPodsAndMarkReady(podClient, podInformer, 1, t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should you validate that the collision counter is bumped and that the pod has TerminationGracePeriodSeconds set to one?
Updated to address comments from @mikedanese. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching the bug!
} | ||
|
||
validateDaemonSetPodsAndMarkReady(podClient, podInformer, 1, t) | ||
validateDaemonSetStatus(dsClient, ds.Name, 1, t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validateDaemonSetPodsAndMarkReady
checks DS has X number of pods and mark them Ready. It doesn't check if the Pods are updated or not. We can skip this step because we don't care if the pods are ready or not.
validateDaemonSetStatus
checks whether the status matches current # of Pods or not, but it doesn't check if the Pods are updated or not, either. We can also skip this one.
There's a possible race that the Pods haven't been updated before we validate collision count and Pods. Suggest instead wait for new pods and new controllerrevision to be created, and then check DS's collision count.
one := int64(1) | ||
ds.Spec.Template.Spec.TerminationGracePeriodSeconds = &one | ||
|
||
newHash, newName := hashAndNameForDaemonSet(ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest add comments for each step (or some steps).
@janetkuo Thanks for the detailed review. |
|
||
// Look up the ControllerRevision for the DaemonSet | ||
_, name := hashAndNameForDaemonSet(ds) | ||
revision, err := clientset.AppsV1().ControllerRevisions(ds.Namespace).Get(name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that the CR haven't been created yet after the DS is created, so wait for CR to be created to avoid possible test flake.
@@ -372,6 +374,33 @@ func waitForPodsCreated(podInformer cache.SharedIndexInformer, num int) error { | |||
}) | |||
} | |||
|
|||
func waitForDaemonSetCreated(c clientset.Interface, name string, namespace string) error { | |||
return wait.Poll(2*time.Second, 30*time.Second, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use PollImmediate()
instead of Poll()
. Use shorter intervals (e.g. 100ms) and shorter timeouts (10s).
I'd like to change all Poll()
to PollImmediate()
and make interval and timeout constants, e.g.
const (
interval = 100 * time.Millisecond
shortTimeout = 10 * time.Second
longTimeout = 60 * time.Second
)
but that should be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the test to use PollImmediate and shorter intervals and timeouts. I can do a separate PR to pull out the constants and switch the other tests in the file to use PollImmediate.
} | ||
|
||
// Wait for the pod with the latest Spec to become available | ||
err = wait.Poll(10*time.Second, 60*time.Second, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
t.Fatalf("Failed to update DaemonSet: %v", err) | ||
} | ||
|
||
// Wait for the pod with the latest Spec to become available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Wait for any pod ... to exist"
objects := podInformer.GetIndexer().List() | ||
for _, object := range objects { | ||
pod := object.(*v1.Pod) | ||
if *pod.Spec.TerminationGracePeriodSeconds == int64(1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nil: == *ds.Spec.Template.Spec.TerminationGracePeriodSeconds
return false, nil | ||
}) | ||
if err != nil { | ||
t.Fatal(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Fatalf("Failed to wait for Pods with the latest Spec to be created: %v", err)
if err != nil { | ||
t.Fatalf("Failed to create ControllerRevision: %v", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: save ds
's collision count here for comparison later. The count should be count+1 at the end of this test.
737cfdc
to
0e28b6c
Compare
@janetkuo I have updated the PR based on the comments. |
/retest |
57ff30a
to
ab23bc4
Compare
var orgCollisionCount int32 | ||
if ds.Status.CollisionCount != nil { | ||
orgCollisionCount = *ds.Status.CollisionCount | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this down to L846 after DS's CR is created and a fresh DS is got back from the server. This is because collision count might change after CR creation.
ab23bc4
to
a93ea43
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: janetkuo, mortent The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 67090, 67159, 66866, 62111, 66476). If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it: This adds an integration test for the case where there is a hash collision when creating a ControllerRevision for a DaemonSet. It also fixes a shadowed variable that prevented this functionality from working as intended.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #62519
Special notes for your reviewer:
/sig apps
Release note: