Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Termination Handler] Explore graceful termination for EC2 Instances #105

Closed
ellistarn opened this issue Nov 3, 2020 · 4 comments
Closed

Comments

@ellistarn
Copy link
Contributor

ellistarn commented Nov 3, 2020

As described in the Design, termination handlers must be layered independently on top of Karpenter's autoscaler. By design, the node termination handler should have no knowledge of autoscaling behavior or configuration, or what even triggered the scale down (e.g. manual, preemption, autoscaling).

Potential requirements/solutions include:

  • Protect instances that are being deleted/scaled down to respect poddisruptionbudgets
  • Build a Karpenter CRD to model lifecycle hooks?
  • Use some sort of CloudProvider model to hook into EC2 lifecycle hooks to protect instances.
@ellistarn
Copy link
Contributor Author

This may be a complete solution https://github.com/aws/aws-node-termination-handler with no work on our side.

@ellistarn
Copy link
Contributor Author

Also worth investigating this guy: https://github.com/pusher/k8s-spot-termination-handler

@bwagner5
Copy link
Contributor

bwagner5 commented Nov 3, 2020

I think the aws-node-termination-handler would integrate with no additional work required for Karpenter. NTH does require a quite a bit of customer setup (creating the lifecycle hooks, eventbridge rules, and SQS queue), but after the initial setup, it can respond to a lot of events very easily.

Since Karpenter is already managing node groups, I'm curious if some of that setup (at least the lifecycle hooks) could be abstracted away and then NTH could just plugin to handle them or receive them from karpenter? The plan for NTH to ease the setup burden is to integrate with ACK but it's still early and most of the resources we need are not available yet.

Knowledge of the actual event does come in handy when processing events. NTH takes slightly different actions depending on the event. For example, ASG lifecycle terminations have different post draining actions than an EC2 Status Change event since the lifecycle hook does not need to be completed. Also EC2 scheduled maintenance event reboots are handled differently in the NTH IMDS processor since the node can be labeled and automatically brought back into service after the reboot.

@ellistarn
Copy link
Contributor Author

My current stance on this is that we should recommend that users rely on https://github.com/aws/aws-node-termination-handler for at least v0.4. We should explore building a karpenter native interruption handler, but this use case (rebalance) should be scoped into the defragmentation design.

gfcroft pushed a commit to gfcroft/karpenter-provider-aws that referenced this issue Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants