The handler sometimes doesn't cordon any node #97

someone-stole-my-name · 2023-04-12T14:05:04Z

Describe the bug

The getRollingUpdateTimestampsFromNode(node) conditions used throughout the code to check if a node should be cordoned or not have a flaw. If something or someone decides to stop the rolling update, and manually uncordon the nodes, the handler on next start will happily evict pods in that node without actually cordoning anything. It is specially problematic when using the eager cordoning feature, since it leads to an upgrade that can never end.

What do you see?

No response

What do you expect to see?

No response

List the steps that must be taken to reproduce this issue

Start the handler
Let it cordon a node
Stop the handler
Uncordon the node manually
Start the handler again

Version

No response

Additional information

No response

The text was updated successfully, but these errors were encountered:

zaafar · 2023-06-01T13:38:13Z

Is there any workaround for this bug?

someone-stole-my-name · 2023-06-01T16:43:35Z

Is there any workaround for this bug?

Manually removing the annotations.

zaafar · 2023-06-01T18:26:56Z

Just to be clear, that wouldn't redo the EAGER_CORDONING, it will only redo the regular cordoning. Is there any hack/trick to redo the EAGER_CORDONING after this bug?

TwiN · 2023-06-05T02:03:48Z

Seems like this bug was introduced in #42. Versions prior to v1.7.0 didn't have the aws-eks-asg-rolling-update-handler.twin.sh/cordoned-at annotations, and I believe I had specifically avoided adding this annotation just in case something like this would happen.

I'll see if I can do something 🤔

…ntroduced in #42 Fixes #97

TwiN · 2023-06-05T03:33:05Z

@zaafar @someone-stole-my-name Hey folks, I think #117 fixed this, but I'd appreciate if you could both give it a try.

As soon as https://github.com/TwiN/aws-eks-asg-rolling-update-handler/actions/runs/5172952729 finishes, it should be available with the latest tag (twinproduction/aws-eks-asg-rolling-update-handler:latest)

zaafar · 2023-06-05T15:09:30Z

sure, i am going to test it and get back to you.

zaafar · 2023-06-05T18:55:03Z

looks good to me.

zaafar · 2023-06-05T18:55:17Z

please whenever you get some time, make a release.

TwiN · 2023-06-05T23:09:56Z

Just released the fix in v1.8.1

someone-stole-my-name added the bug Something isn't working label Apr 12, 2023

TwiN added a commit that referenced this issue Jun 5, 2023

fix: Prevent never-ending upgrade due to the cordoned-at annotation i…

b863750

…ntroduced in #42 Fixes #97

TwiN mentioned this issue Jun 5, 2023

fix: Prevent never-ending upgrade due to the cordoned-at annotation #117

Merged

2 tasks

TwiN closed this as completed in cf5ea03 Jun 5, 2023

TwiN reopened this Jun 5, 2023

TwiN pinned this issue Jun 5, 2023

TwiN closed this as completed Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The handler sometimes doesn't cordon any node #97

The handler sometimes doesn't cordon any node #97

someone-stole-my-name commented Apr 12, 2023

zaafar commented Jun 1, 2023

someone-stole-my-name commented Jun 1, 2023

zaafar commented Jun 1, 2023 •

edited

Loading

TwiN commented Jun 5, 2023

TwiN commented Jun 5, 2023

zaafar commented Jun 5, 2023

zaafar commented Jun 5, 2023

zaafar commented Jun 5, 2023

TwiN commented Jun 5, 2023

The handler sometimes doesn't cordon any node #97

The handler sometimes doesn't cordon any node #97

Comments

someone-stole-my-name commented Apr 12, 2023

Describe the bug

What do you see?

What do you expect to see?

List the steps that must be taken to reproduce this issue

Version

Additional information

zaafar commented Jun 1, 2023

someone-stole-my-name commented Jun 1, 2023

zaafar commented Jun 1, 2023 • edited Loading

TwiN commented Jun 5, 2023

TwiN commented Jun 5, 2023

zaafar commented Jun 5, 2023

zaafar commented Jun 5, 2023

zaafar commented Jun 5, 2023

TwiN commented Jun 5, 2023

zaafar commented Jun 1, 2023 •

edited

Loading