Native Support for Spot Termination #702

bwagner5 · 2021-09-23T21:11:16Z

Tell us about your request

Native Support for Spot Termination

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

Are you currently working around this issue?
How are you currently solving this problem?
Using aws-node-termination-handler up to this point (#105)

Additional context

This will be an aws specific spot termination controller.

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

eptiger · 2021-10-13T19:06:50Z

One issue that we have to be aware of is support for rebalance recommendations. CA customers can utilize Capacity Rebalance on their ASG with MNG or with NTH, but since Karpenter doesn't use ASG we have to implement a proactive replacement such that we only terminate an instance that receives a rebalance recommendation if there's an instance in another instance type/zone combo that has Spot capacity we are able to launch.

ellistarn · 2021-10-13T22:04:08Z

I think the INT signal is easiest to handle, since we're going to lose the node very soon, we should just gracefully drain as soon as possible.

For Rebalance, does it makes sense to think of this in the same way as defrag? I think the workflow is pretty similar:

pre-spin a new node (w/ pods on existing node)
drain the candidate node

eptiger · 2021-10-13T23:05:03Z

It's largely the same as de-frag, with a caveat around not going back into the same pool. You could imagine looking at all the RBNed nodes and trying to binpack their pods and see if you could spin up nodes from pools other than a pool any of those nodes were from.

ellistarn · 2021-10-13T23:18:00Z

If we're using something like CapacityOptimizedAllocationStrategy, we might be able to simply rely on fleet to not give us an instance in the same pool. In this sense, we can make karpenter unaware of the details and just let ec2 do the decision making.

eptiger · 2021-10-14T00:44:45Z

I think it will be easier for us to talk about this in a call, but that's not how the CO strategy works. You can maybe use CO prioritized and set that pool at a low priority, but otherwise there's no reason for it to avoid that pool. If all the pools you provide to the CO strategy are constrained, it will give you the least constrained pool, which may be the pool the instance that received the rebalance recommendation was in.

ppodevlabs · 2022-03-08T12:46:44Z

are there any plans to support this?

bwagner5 · 2022-04-20T18:05:36Z

Yes, we are working on this functionality currently

sqerison · 2022-05-23T15:12:50Z

Looking forward to seeing this feature implemented. It will save tons of IPs for our cluster.

aavileli · 2022-07-12T06:45:50Z

I am assuming node termination handler logic been has been removed from karpentar. For graceful node shutdown when a user terminates a node in aws console or spot termination
request by aws is there a need to install NTH with Queue Processor(https://github.com/aws/aws-node-termination-handler#which-one-should-i-use).
Adding some information into the documentation will be very helpful

BryanStenson-okta · 2022-08-31T20:33:45Z

Is "spot termination" for this issue restricted to spot instances? Does this issue's scope include termination of non-spot instances (for underlying hardware maintenance by AWS, for example)?

bwagner5 · 2022-08-31T21:10:43Z

Is "spot termination" for this issue restricted to spot instances? Does this issue's scope include termination of non-spot instances (for underlying hardware maintenance by AWS, for example)?

Yes, it does include support for AWS Health events as well, similar to the aws-node-termination-handler.

bwagner5 added the feature New feature or request label Sep 23, 2021

bwagner5 added this to the v0.5.0 milestone Sep 23, 2021

bwagner5 added the AWS label Sep 23, 2021

bwagner5 self-assigned this Sep 23, 2021

ellistarn mentioned this issue Dec 13, 2021

EC2 spot instance interruption handling #974

Closed

stevehipwell mentioned this issue Dec 15, 2021

Add documentation in using Queue Processor with Karpenter aws/aws-node-termination-handler#547

Closed

ellistarn mentioned this issue Jan 19, 2022

Handling of 'Spot Instance interruptions' #1184

Closed

bwagner5 removed this from the v0.5.0 milestone Apr 20, 2022

archoversight mentioned this issue Apr 26, 2022

docs: Replace cluster autoscaler/node termination handler example with Karpenter terraform-aws-modules/terraform-aws-eks#1994

Merged

3 tasks

njtran mentioned this issue Apr 28, 2022

Mega Issue: Deprovisioning Controls #1738

Closed

18 tasks

snay2 mentioned this issue Jul 20, 2022

Add documentation for aws-node-termination-handler configuration #2159

Closed

jonathan-innis mentioned this issue Sep 23, 2022

feat: Add Native Spot Termination Handling #2546

Merged

3 tasks

jonathan-innis closed this as completed in #2546 Nov 3, 2022

hakman mentioned this issue Nov 26, 2022

WIP Disallow running NTH and Karpenter together kubernetes/kops#14668

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Support for Spot Termination #702

Native Support for Spot Termination #702

bwagner5 commented Sep 23, 2021

eptiger commented Oct 13, 2021

ellistarn commented Oct 13, 2021

eptiger commented Oct 13, 2021 •

edited

Loading

ellistarn commented Oct 13, 2021

eptiger commented Oct 14, 2021

ppodevlabs commented Mar 8, 2022

bwagner5 commented Apr 20, 2022

sqerison commented May 23, 2022

aavileli commented Jul 12, 2022 •

edited

Loading

BryanStenson-okta commented Aug 31, 2022

bwagner5 commented Aug 31, 2022

Native Support for Spot Termination #702

Native Support for Spot Termination #702

Comments

bwagner5 commented Sep 23, 2021

Community Note

eptiger commented Oct 13, 2021

ellistarn commented Oct 13, 2021

eptiger commented Oct 13, 2021 • edited Loading

ellistarn commented Oct 13, 2021

eptiger commented Oct 14, 2021

ppodevlabs commented Mar 8, 2022

bwagner5 commented Apr 20, 2022

sqerison commented May 23, 2022

aavileli commented Jul 12, 2022 • edited Loading

BryanStenson-okta commented Aug 31, 2022

bwagner5 commented Aug 31, 2022

eptiger commented Oct 13, 2021 •

edited

Loading

aavileli commented Jul 12, 2022 •

edited

Loading