Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native Support for Spot Termination #702

Closed
bwagner5 opened this issue Sep 23, 2021 · 11 comments · Fixed by #2546
Closed

Native Support for Spot Termination #702

bwagner5 opened this issue Sep 23, 2021 · 11 comments · Fixed by #2546
Assignees
Labels
feature New feature or request

Comments

@bwagner5
Copy link
Contributor

Tell us about your request

  • Native Support for Spot Termination

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

Are you currently working around this issue?
How are you currently solving this problem?
Using aws-node-termination-handler up to this point (#105)

Additional context

  • This will be an aws specific spot termination controller.

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@bwagner5 bwagner5 added the feature New feature or request label Sep 23, 2021
@bwagner5 bwagner5 added this to the v0.5.0 milestone Sep 23, 2021
@bwagner5 bwagner5 added the AWS label Sep 23, 2021
@bwagner5 bwagner5 self-assigned this Sep 23, 2021
@eptiger
Copy link
Contributor

eptiger commented Oct 13, 2021

One issue that we have to be aware of is support for rebalance recommendations. CA customers can utilize Capacity Rebalance on their ASG with MNG or with NTH, but since Karpenter doesn't use ASG we have to implement a proactive replacement such that we only terminate an instance that receives a rebalance recommendation if there's an instance in another instance type/zone combo that has Spot capacity we are able to launch.

@ellistarn
Copy link
Contributor

I think the INT signal is easiest to handle, since we're going to lose the node very soon, we should just gracefully drain as soon as possible.

For Rebalance, does it makes sense to think of this in the same way as defrag? I think the workflow is pretty similar:

  • pre-spin a new node (w/ pods on existing node)
  • drain the candidate node

@eptiger
Copy link
Contributor

eptiger commented Oct 13, 2021

It's largely the same as de-frag, with a caveat around not going back into the same pool. You could imagine looking at all the RBNed nodes and trying to binpack their pods and see if you could spin up nodes from pools other than a pool any of those nodes were from.

@ellistarn
Copy link
Contributor

If we're using something like CapacityOptimizedAllocationStrategy, we might be able to simply rely on fleet to not give us an instance in the same pool. In this sense, we can make karpenter unaware of the details and just let ec2 do the decision making.

@eptiger
Copy link
Contributor

eptiger commented Oct 14, 2021

I think it will be easier for us to talk about this in a call, but that's not how the CO strategy works. You can maybe use CO prioritized and set that pool at a low priority, but otherwise there's no reason for it to avoid that pool. If all the pools you provide to the CO strategy are constrained, it will give you the least constrained pool, which may be the pool the instance that received the rebalance recommendation was in.

@ppodevlabs
Copy link

are there any plans to support this?

@bwagner5 bwagner5 removed this from the v0.5.0 milestone Apr 20, 2022
@bwagner5
Copy link
Contributor Author

Yes, we are working on this functionality currently

@sqerison
Copy link

Looking forward to seeing this feature implemented. It will save tons of IPs for our cluster.

@aavileli
Copy link

aavileli commented Jul 12, 2022

I am assuming node termination handler logic been has been removed from karpentar. For graceful node shutdown when a user terminates a node in aws console or spot termination
request by aws is there a need to install NTH with Queue Processor(https://github.com/aws/aws-node-termination-handler#which-one-should-i-use).
Adding some information into the documentation will be very helpful

@BryanStenson-okta
Copy link
Contributor

Is "spot termination" for this issue restricted to spot instances? Does this issue's scope include termination of non-spot instances (for underlying hardware maintenance by AWS, for example)?

@bwagner5
Copy link
Contributor Author

Is "spot termination" for this issue restricted to spot instances? Does this issue's scope include termination of non-spot instances (for underlying hardware maintenance by AWS, for example)?

Yes, it does include support for AWS Health events as well, similar to the aws-node-termination-handler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants