Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drain nodes in parallel #4864

Merged
merged 9 commits into from
Mar 16, 2022
Merged

Conversation

aclevername
Copy link
Contributor

@aclevername aclevername commented Mar 1, 2022

Description

Closes #4705

Adds support for providing --parallel <value> (open to renaming!) flag to delete/drain nodegroup that will mean the nodegroups are drained in parallel. I've set a upper limit of 25 to match our k8s client. example below

Testing

50 node cluster with a 1000 pods deployment

Before:
eksctl drain nodegroup --cluster jk --name ng-1 would take 4:40.22 to complete:

2022-03-01 14:49:01 [ℹ]  eksctl version 0.81.0-rc.0
2022-03-01 14:49:01 [ℹ]  using region us-west-2
2022-03-01 14:49:02 [ℹ]  1 nodegroup (ng-1) was included (based on the include/exclude rules)
2022-03-01 14:49:02 [ℹ]  will drain 1 nodegroup(s) in cluster "jk"
2022-03-01 14:49:02 [ℹ]  will drain 0 managed nodegroup(s) in cluster "jk"
2022-03-01 14:49:03 [ℹ]  cordon node "ip-192-168-11-109.us-west-2.compute.internal"
...
2022-03-01 14:49:13 [ℹ]  cordon node "ip-192-168-95-46.us-west-2.compute.internal"
2022-03-01 14:49:14 [!]  ignoring DaemonSet-managed Pods: kube-system/aws-node-vv2xp, kube-system/kube-proxy-p6txh
...
2022-03-01 14:53:39 [!]  ignoring DaemonSet-managed Pods: kube-system/aws-node-nh97r, kube-system/kube-proxy-k4njl
2022-03-01 14:53:40 [✔]  drained all nodes: [ip-192-168-11-109.us-west-2.compute.internal ip-192-168-17-55.us-west-2.compute.internal ip-192-168-19-175.us-west-2.compute.internal ip-192-168-21-10.us-west-2.compute.internal ip-192-168-21-175.us-west-2.compute.internal ip-192-168-21-26.us-west-2.compute.internal ip-192-168-22-105.us-west-2.compute.internal ip-192-168-23-243.us-west-2.compute.internal ip-192-168-24-150.us-west-2.compute.internal ip-192-168-24-240.us-west-2.compute.internal ip-192-168-26-129.us-west-2.compute.internal ip-192-168-28-196.us-west-2.compute.internal ip-192-168-29-155.us-west-2.compute.internal ip-192-168-31-119.us-west-2.compute.internal ip-192-168-31-46.us-west-2.compute.internal ip-192-168-33-67.us-west-2.compute.internal ip-192-168-35-232.us-west-2.compute.internal ip-192-168-36-250.us-west-2.compute.internal ip-192-168-39-116.us-west-2.compute.internal ip-192-168-43-119.us-west-2.compute.internal ip-192-168-46-26.us-west-2.compute.internal ip-192-168-49-215.us-west-2.compute.internal ip-192-168-49-74.us-west-2.compute.internal ip-192-168-50-124.us-west-2.compute.internal ip-192-168-53-176.us-west-2.compute.internal ip-192-168-53-236.us-west-2.compute.internal ip-192-168-53-70.us-west-2.compute.internal ip-192-168-56-255.us-west-2.compute.internal ip-192-168-59-105.us-west-2.compute.internal ip-192-168-59-109.us-west-2.compute.internal ip-192-168-6-88.us-west-2.compute.internal ip-192-168-63-151.us-west-2.compute.internal ip-192-168-65-247.us-west-2.compute.internal ip-192-168-67-138.us-west-2.compute.internal ip-192-168-71-5.us-west-2.compute.internal ip-192-168-75-206.us-west-2.compute.internal ip-192-168-76-41.us-west-2.compute.internal ip-192-168-8-83.us-west-2.compute.internal ip-192-168-80-58.us-west-2.compute.internal ip-192-168-81-173.us-west-2.compute.internal ip-192-168-82-52.us-west-2.compute.internal ip-192-168-84-117.us-west-2.compute.internal ip-192-168-85-150.us-west-2.compute.internal ip-192-168-88-138.us-west-2.compute.internal ip-192-168-92-163.us-west-2.compute.internal ip-192-168-93-53.us-west-2.compute.internal ip-192-168-93-77.us-west-2.compute.internal ip-192-168-94-58.us-west-2.compute.internal ip-192-168-95-231.us-west-2.compute.internal ip-192-168-95-46.us-west-2.compute.internal]

After:
With --parallel set to 20
eksctl drain nodegroup --cluster jk --name ng-1 --parallel 20 would take 1:24.77 to complete:

2022-03-02 11:22:51 [ℹ]  eksctl version 0.87.0-dev+0316346c.2022-03-02T11:22:39Z
2022-03-02 11:22:51 [ℹ]  using region us-west-2
2022-03-02 11:22:53 [ℹ]  1 nodegroup (ng-1) was included (based on the include/exclude rules)
2022-03-02 11:22:53 [ℹ]  will drain 1 nodegroup(s) in cluster "jk"
2022-03-02 11:22:53 [ℹ]  will drain 0 managed nodegroup(s) in cluster "jk"
2022-03-02 11:22:54 [ℹ]  starting parallel draining, max in-flight of 20
2022-03-02 11:22:54 [ℹ]  cordon node "ip-192-168-1-209.us-west-2.compute.internal"
2022-03-02 11:22:55 [ℹ]  cordon node "ip-192-168-13-168.us-west-2.compute.internal"
...
2022-03-02 11:24:15 [✔]  drained all nodes: [ip-192-168-1-209.us-west-2.compute.internal ip-192-168-13-168.us-west-2.compute.internal ip-192-168-17-0.us-west-2.compute.internal ip-192-168-17-179.us-west-2.compute.internal ip-192-168-19-175.us-west-2.compute.internal ip-192-168-19-182.us-west-2.compute.internal ip-192-168-19-99.us-west-2.compute.internal ip-192-168-21-131.us-west-2.compute.internal ip-192-168-21-186.us-west-2.compute.internal ip-192-168-23-210.us-west-2.compute.internal ip-192-168-24-126.us-west-2.compute.internal ip-192-168-25-76.us-west-2.compute.internal ip-192-168-28-3.us-west-2.compute.internal ip-192-168-3-85.us-west-2.compute.internal ip-192-168-30-211.us-west-2.compute.internal ip-192-168-34-131.us-west-2.compute.internal ip-192-168-34-232.us-west-2.compute.internal ip-192-168-36-27.us-west-2.compute.internal ip-192-168-37-95.us-west-2.compute.internal ip-192-168-38-109.us-west-2.compute.internal ip-192-168-39-192.us-west-2.compute.internal ip-192-168-4-189.us-west-2.compute.internal ip-192-168-41-195.us-west-2.compute.internal ip-192-168-45-34.us-west-2.compute.internal ip-192-168-47-54.us-west-2.compute.internal ip-192-168-49-194.us-west-2.compute.internal ip-192-168-51-103.us-west-2.compute.internal ip-192-168-51-86.us-west-2.compute.internal ip-192-168-52-211.us-west-2.compute.internal ip-192-168-53-111.us-west-2.compute.internal ip-192-168-53-222.us-west-2.compute.internal ip-192-168-54-137.us-west-2.compute.internal ip-192-168-54-196.us-west-2.compute.internal ip-192-168-64-151.us-west-2.compute.internal ip-192-168-66-23.us-west-2.compute.internal ip-192-168-7-237.us-west-2.compute.internal ip-192-168-71-104.us-west-2.compute.internal ip-192-168-72-91.us-west-2.compute.internal ip-192-168-73-224.us-west-2.compute.internal ip-192-168-73-235.us-west-2.compute.internal ip-192-168-73-56.us-west-2.compute.internal ip-192-168-76-111.us-west-2.compute.internal ip-192-168-76-194.us-west-2.compute.internal ip-192-168-78-98.us-west-2.compute.internal ip-192-168-83-201.us-west-2.compute.internal ip-192-168-84-187.us-west-2.compute.internal ip-192-168-85-236.us-west-2.compute.internal ip-192-168-86-64.us-west-2.compute.internal ip-192-168-89-148.us-west-2.compute.internal ip-192-168-94-144.us-west-2.compute.internal]

@@ -524,6 +525,12 @@ func NewDeleteNodeGroupLoader(cmd *Cmd, ng *api.NodeGroup, ngFilter *filter.Node
return ErrMustBeSet("--name")
}

if flag := l.CobraCommand.Flag("parallel"); flag != nil && flag.Changed {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its always an int since the flag type enforces it

@aclevername aclevername marked this pull request as ready for review March 2, 2022 13:26
@aclevername aclevername force-pushed the parallel-drain branch 2 times, most recently from 1c1d2be to 6129061 Compare March 2, 2022 13:33
Copy link
Contributor

@Callisto13 Callisto13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

garbage

Copy link
Contributor

@nikimanoledaki nikimanoledaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Wanted to ask a Qs before reviewing more! It's also worth adding something to the docs here https://eksctl.io/usage/managing-nodegroups/#deleting-and-draining :)

@aclevername
Copy link
Contributor Author

Awesome! Wanted to ask a Qs before reviewing more! It's also worth adding something to the docs here https://eksctl.io/usage/managing-nodegroups/#deleting-and-draining :)

Added 😄

@aclevername aclevername requested a review from Himangini March 11, 2022 11:42
Copy link
Contributor

@Himangini Himangini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏻

@nikimanoledaki
Copy link
Contributor

@aclevername can we wait to merge this on Monday so that it goes in the next release? :) Today's release needs a little bit of extra TLC. :D

@Himangini Himangini changed the title drain nodes in parallel Drain nodes in parallel Mar 11, 2022
@aclevername aclevername enabled auto-merge (squash) March 16, 2022 13:27
@aclevername aclevername merged commit 391a6ce into eksctl-io:main Mar 16, 2022
aclevername added a commit that referenced this pull request Mar 17, 2022
aclevername added a commit that referenced this pull request Mar 17, 2022
aclevername pushed a commit that referenced this pull request Mar 17, 2022
aclevername added a commit that referenced this pull request Mar 22, 2022
* Revert "Revert "Drain nodes in parallel (#4864)" (#4964)"

This reverts commit 00f5fcf.

* set value in delete cluster

* Update pkg/drain/nodegroup.go

Co-authored-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>

* Update pkg/drain/nodegroup.go

Co-authored-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>

* update unit tests

Co-authored-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
@hspencer77 hspencer77 mentioned this pull request Jul 8, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel node draining during node group deletion for nodegroups
5 participants