-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: check only controller ref to decide if a pod is replicated #5507
feat: check only controller ref to decide if a pod is replicated #5507
Conversation
// Using names like FooController is discouraged | ||
// https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#naming-conventions | ||
// vadasambar: I am using it here just because `FooController`` is easier to understand than say `FooSet` | ||
OwnerReferences: GenerateOwnerReferences("Foo", "FooController", "apps/v1", ""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any other suggestions in place of FooController
are welcome.
/assign I would preserve the existing behavior for known controllers - perhaps behind a flag - so that there's no regression. We can eventually delete the flag after a few releases, but I'd like to avoid breaking existing users. |
Makes sense to me (don't want to surprise users with a new default behavior). Let me think about this. |
I had a few hiccups around trying to get a custom CA run properly on GKE (all good now). I want to test this PR manually on a GKE cluster. The rough idea is something like this:
|
Rather than on logs, I'd just rely on behavior. In this case you could do something like this:
Note: N=2 might be problematic due to system workloads, but N=3 should work. Alternatively, you can use a separate nodepool to put all non-DS system workloads there using taints/tolerations, so that they don't interfere with the test. |
cluster-autoscaler/main.go
Outdated
maxFreeDifferenceRatio = flag.Float64("max-free-difference-ratio", config.DefaultMaxFreeDifferenceRatio, "Maximum difference in free resources between two similar node groups to be considered for balancing. Value is a ratio of the smaller node group's free resource.") | ||
maxAllocatableDifferenceRatio = flag.Float64("max-allocatable-difference-ratio", config.DefaultMaxAllocatableDifferenceRatio, "Maximum difference in allocatable resources between two similar node groups to be considered for balancing. Value is a ratio of the smaller node group's allocatable resource.") | ||
forceDaemonSets = flag.Bool("force-ds", false, "Blocks scale-up of node groups too small for all suitable Daemon Sets pods.") | ||
allowScaleDownOnCustomControllerOwnedPods = flag.Bool("allow-scale-down-on-custom-controller-owned-pods", false, "Don't block node scale-down if a pod owned by a custom controller is running on the node.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better naming suggestions are welcome (the flag name feels a little too long).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is set to false
right now to provide backwards compatibility. This flag would be set to true
in the future (we might remove it completely and use true
one as the default behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vadasambar Just an idea, maybe you could follow the skip-nodes-with-*
scheme, for instance,
skip-nodes-with-custom-controller-pods
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregth thank you for the suggestion. I thought the new flag name skip-nodes-with-custom-controller-pods
wouldn't tell the user about if the skipping happens during scale-up or scale-down but looks like we already use skip-nodes-with-*
flag naming for skipping things when scaling down nodes:
autoscaler/cluster-autoscaler/main.go
Lines 214 to 215 in 1acb6b2
skipNodesWithSystemPods = flag.Bool("skip-nodes-with-system-pods", true, "If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods)") | |
skipNodesWithLocalStorage = flag.Bool("skip-nodes-with-local-storage", true, "If true cluster autoscaler will never delete nodes with pods with local storage, e.g. EmptyDir or HostPath") |
Making the naming consistent makes sense to me. I will change it to skip-nodes-with-custom-controller-pods
. Thank you.
P.S.: If you have any other suggestions, I would love to know.
} else { | ||
replicated = true | ||
} else { | ||
checkReferences := listers != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in the else
part is just copy-paste of our current code. I have moved a couple of variable declarations into the else part so that it's easier to remove in the future.
📝 P.S.: Fixed |
b718749
to
13921ae
Compare
3d24178
to
88c4a2d
Compare
cluster-autoscaler/main.go
Outdated
maxNodeGroupBinpackingDuration = flag.Duration("max-nodegroup-binpacking-duration", 10*time.Second, "Maximum time that will be spent in binpacking simulation for each NodeGroup.") | ||
skipNodesWithSystemPods = flag.Bool("skip-nodes-with-system-pods", true, "If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods)") | ||
skipNodesWithLocalStorage = flag.Bool("skip-nodes-with-local-storage", true, "If true cluster autoscaler will never delete nodes with pods with local storage, e.g. EmptyDir or HostPath") | ||
scaleDownNodesWithCustomControllerPods = flag.Bool("scale-down-nodes-with-custom-controller-pods", false, "If true cluster autoscaler will delete nodes with pods owned by custom controllers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new flag. Looks like adding the new flag added extra space in every flag definition.
cluster-autoscaler/main.go
Outdated
NodeGroupBackoffResetTimeout: *nodeGroupBackoffResetTimeout, | ||
MaxScaleDownParallelism: *maxScaleDownParallelismFlag, | ||
MaxDrainParallelism: *maxDrainParallelismFlag, | ||
GceExpanderEphemeralStorageSupport: *gceExpanderEphemeralStorageSupport, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Extra space was added because I added ScaleDownNodesWithCustomControllerPods
on line 332 below.
@@ -231,6 +158,104 @@ func GetPodsForDeletionOnNodeDrain( | |||
return pods, daemonSetPods, nil, nil | |||
} | |||
|
|||
func legacyCheckForReplicatedPods(listers kube_util.ListerRegistry, pod *apiv1.Pod, minReplica int32) (replicated bool, isDaemonSetPod bool, blockingPod *BlockingPod, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name legacyCheckForReplicatedPods
doesn't account for checking for daemon set pods or blocking pods but using legacyCheckForReplicatedAndDaemonSetAndBlockingPods
feels too long so I have kept it as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the return type replicated bool, isDaemonSetPod bool, blockingPod *BlockingPod, err error
has names. I did this because I think it improves readability when you hover over the function call in IDEs e.g.,
This might not be the best approach since it can confuse someone when they look at line 162 and 165 below (because the variables are already defined due to them being named return types). Open to other ideas here.
checkReferences := listers != nil | ||
isDaemonSetPod = false | ||
|
||
controllerRef := ControllerRef(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same code as our current code here abstracted into a function. I have done some minor changes like:
- Change return values to include
replicated
andisDaemonSetPod
. Also, remove[]*apiv1.Pod
since we don't need to return it from this function. Returning pod list is done in theelse
part here. - Don't do appending to
daemonSetPods
slice here. Instead just return ifisDaemonSetPod
and let the calling function do the appending because I thought it was cleaner and need the appending part even if we remove this function in the future.
for _, test := range tests { | ||
// run all tests for scaleDownNodesWithCustomControllerPods=false | ||
// TODO(vadasambar): remove this when we get rid of scaleDownNodesWithCustomControllerPods | ||
for _, test := range append(tests, testOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doing append
in the for loop declaration itself. This might not be the best approach. I didn't want to create 2 variables called testsWithScaleDownNodesWithCustomControllerPodsDisabled
and testsWithScaleDownNodesWithCustomControllerPodsEnabled
. Any better suggestions are welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is wrong about being explicit? I'd something along the lines of:
- Make custom controller pods flag a part of the test case struct.
- Define a list of shared test cases.
- Define a list of flag-disabled test cases (copy the shared ones and add the extra one)
- Define a list of flag-enabled test cases (copy the shared ones, flip the flag on all and add the extra one)
- Define the body of the test once and run it for a combined list (flag-enabled + flag-disabled).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, just copy all test cases and have a single list in the first place :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I like the idea of putting the flag on the test struct and combining all tests into one. Let me try it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
// run all tests for scaleDownNodesWithCustomControllerPods=true | ||
for _, test := range append(tests, testOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I am running the same tests twice. Code here is practically the same as code beyond line above. The only difference is scaleDownNodesWithCustomControllerPods
is set to false here and it is set to true
in the test cases beyond the above line.
I did think of abstracting duplicate code into a separate function but I am not so sure after thinking about the comment here since abstracting the tests into a function would add another level of indirection. I am a little on the fence about doing that now. Any suggestions here are welcome.
SkipNodesWithSystemPods: true, | ||
SkipNodesWithLocalStorage: true, | ||
MinReplicaCount: 0, | ||
SkipNodesWithCustomControllerPods: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have set SkipNodesWithCustomControllerPods: true
to preserve current behavior of these test.
if err != nil { | ||
return []*apiv1.Pod{}, []*apiv1.Pod{}, blockingPod, err | ||
} | ||
} else { | ||
if controllerRef != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like controllerRef
is not used anywhere else, so maybe just directly call ControllerRef()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thank you for pointing this out.
if len(pods) != len(test.expectPods) { | ||
t.Fatalf("Wrong pod list content: %v", test.description) | ||
} | ||
for i := range tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was to duplicate the test cases and flip the flag on the copy to ensure the new flag doesn't affect the vast majority of test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's another problem with the code where the test cases for flag=false are not there.
My suggestion was to duplicate the test cases and flip the flag on the copy to ensure the new flag doesn't affect the vast majority of test cases.
I see. Just to confirm my understanding. You mean something like this right?
tests := []{
// *flag: true test cases*
{
// shared test case 1
flag: true
},
{
// shared test case 2
flag: true
},
{
// additional test case 1
flag: true
},
// *flag: false test cases*
{
// shared test case 1
flag: false
},
{
// shared test case 2
flag: false
},
{
// additional test case 1
flag: false
},
}
for _, test := range tests {
// execute test case
}
shared test case 1
, shared test case 2
and additional test case 1
are duplicated with some minor tweaks (if required) and flipped flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct, one problem I see is we would have to change name of the duplicated test cases slightly to differentiate them from original test cases so that it's easy to identify which test case failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which I think should be fine since it would be a one time thing for now.
But, if we have a similar flag in the future say --skip-nodes-with-ignore-local-vol-storage-annotation
, we would need 4 sets of test cases ([current-flag=true,future-flag=true], [true, false], [false, false], [false, true]) which are almost similar to each other with slight differences. This will get difficult to maintain as we go forward. One argument here can be, we can deal with it once we get to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need a more future-proof solution here but it would also increase the code complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can duplicate the test cases for now and there can be another issue to work through a better solution for test cases duplication/maintainability problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would have to change name of the duplicated test cases
This can be solved by adding the flag in the error log output here so that the error output includes the flag value. Something like this:
drain_test.go:955: Custom-controller-managed non-blocking pod: unexpected non-error, skipNodesWithCustomControllerPods: false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is what you wanted to tell me. This can be a possible solution:
sharedTests := []testOpts{
{
// shared test case 1
},
{
// shared test case 2
},
}
allTests = []testOpts{}
for _, sharedTest := range sharedTests {
sharedTest.skipNodesWithCustomControllerPods = true
allTests = append(allTests, sharedTest)
sharedTest.skipNodesWithCustomControllerPods = false
allTests = append(allTests, sharedTest)
}
allTests = append(allTests, testOpts{
// additional test case 1
flag: true
})
allTests = append(allTests, testOpts{
// additional test case 1
flag: false
})
for _, sharedTest := range sharedTests {
sharedTest.newFlagInFuture = true
allTests = append(allTests, sharedTest)
sharedTest.newFlagInFuture = false
allTests = append(allTests, sharedTest)
}
allTests = append(allTests, testOpts{
// additional test case 2 for new flag in future
flag: false
})
allTests = append(allTests, testOpts{
// additional test case 2 for new flag in future
flag: true
})
for _, test := range tests {
// execute test case
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this PR with a reference implementation based on the above idea. Happy to change it based on your review comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
for _, sharedTest := range sharedTests { | ||
// to execute the same shared tests for when the skipNodesWithCustomControllerPods flag is true | ||
// and when the flag is false | ||
sharedTest.skipNodesWithCustomControllerPods = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could update sharedTest.description
to append a flag here, so it only affects the shared tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. 👍
Thanks for the changes! The code looks good to me now, can you just squash the commits before merging? |
// make sure you shallow copy the test like this | ||
// before you modify it | ||
// (so that modifying one test doesn't affect another) | ||
enabledTest := sharedTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I am creating a shallow copy of the sharedTest
so that if I modify sharedTest
once it doesn't affect the second modification. e.g.,
for _, sharedTest := range sharedTests {
sharedTest.skipNodesWithCustomControllerPods = true
sharedTest.description = fmt.Sprintf("%s with skipNodesWithCustomControllerPods: %v",
sharedTest.description, sharedTest.skipNodesWithCustomControllerPods)
allTests = append(allTests, sharedTest)
sharedTest.skipNodesWithCustomControllerPods = false
sharedTest.description = fmt.Sprintf("%s with skipNodesWithCustomControllerPods: %v",
sharedTest.description, sharedTest.skipNodesWithCustomControllerPods)
allTests = append(allTests, sharedTest)
}
fmt.Println("allTests[0]", allTests[0])
fmt.Println("allTests[1]", allTests[1])
Prints
allTests[0] {RC-managed pod with skipNodesWithCustomControllerPods:true [0xc00018ad80] [] [0xc000498c60] [] false [0xc00018ad80] [] <nil> true}
allTests[1] {RC-managed pod with skipNodesWithCustomControllerPods:true with skipNodesWithCustomControllerPods:false [0xc00018ad80] [] [0xc000498c60] [] false [0xc00018ad80] [] <nil> false}
Note the description and the last bool (which is our flag)
To overcome this, I use shallow copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output when we use shallow copy
allTests[0] {RC-managed pod with skipNodesWithCustomControllerPods:true [0xc000202d80] [] [0xc0004b54a0] [] false [0xc000202d80] [] <nil> true}
allTests[1] {RC-managed pod with skipNodesWithCustomControllerPods:false [0xc000202d80] [] [0xc0004b54a0] [] false [0xc000202d80] [] <nil> false}
Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit 144a64a) fix: set `replicated` to true if controller ref is set to `true` - forgot to add this in the last commit Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit f8f4582) fix: remove `checkReferences` - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit 5df6e31) test(drain): add test for custom controller pod Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add flag to allow scale down on custom controller pods - set to `false` by default - `false` will be set to `true` by default in the future - right now, we want to ensure backwards compatibility and make the feature available if the flag is explicitly set to `true` - TODO: this code might need some unit tests. Look into adding unit tests. Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: remove `at` symbol in prefix of `vadasambar` - to keep it consistent with previous such mentions in the code Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(utils): run all drain tests twice - once for `allowScaleDownOnCustomControllerOwnedPods=false` - and once for `allowScaleDownOnCustomControllerOwnedPods=true` Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs(utils): add description for `testOpts` struct Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: update FAQ with info about `allow-scale-down-on-custom-controller-owned-pods` flag Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `allow-scale-down-on-custom-controller-owned-pods` -> `skip-nodes-with-custom-controller-pods` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `allowScaleDownOnCustomControllerOwnedPods` -> `skipNodesWithCustomControllerPods` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(utils/drain): fix failing tests - refactor code to add cusom controller pod test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: fix long code comments - clean-up print statements Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move `expectFatal` right above where it is used - makes the code easier to read Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: fix code comment wording Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address PR comments - abstract legacy code to check for replicated pods into a separate function so that it's easier to remove in the future - fix param info in the FAQ.md - simplify tests and remove the global variable used in the tests - rename `--skip-nodes-with-custom-controller-pods` -> `--scale-down-nodes-with-custom-controller-pods` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename flag `--scale-down-nodes-with-custom-controller-pods` -> `--skip-nodes-with-custom-controller-pods` - refactor tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: update flag info Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: forgot to change flag name on a line in the code Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `ControllerRef()` directly instead of `controllerRef` - we don't need an extra variable Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: create tests consolidated test cases - from looping over and tweaking shared test cases - so that we don't have to duplicate shared test cases Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: append test flag to shared test description - so that the failed test is easy to identify - shallow copy tests and add comments so that others do the same Signed-off-by: vadasambar <surajrbanakar@gmail.com>
dee5eae
to
ff6fe58
Compare
Thank you for bearing with me. I have squashed the commits and added a comment. |
@x13n unless you have any other comments, I think I'm done from my side. |
Great, thank you for the changes! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vadasambar, x13n The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR is a follow-up to #5419 (comment)
This PR is still WIP. I will mention the reviewers here once I am done. Although it might not be in the best shape as a WIP PR but if the reviewers have any feedback, I would love to have it. 🙏Signed-off-by: vadasambar surajrbanakar@gmail.com
What type of PR is this?
/kind feature
What this PR does / why we need it:
controller: true
are treated as replicated and scale-down is not blockedWhich issue(s) this PR fixes:
Fixes #5387
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
TBD