Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support Karpenter v1beta1 nodeclaims #188

Merged
merged 2 commits into from
Oct 30, 2023
Merged

Conversation

bwagner5
Copy link
Contributor

@bwagner5 bwagner5 commented Oct 27, 2023

Issue #, if available:
N/A

Description of changes:

  • Support Karpenter's v1beta1 NodeClaim API.
    • Karpenter creates a NodeClaim before the Node registers itself to the cluster. Adding support in the eks-node-viewer gives users more visibility into nodes that are currently booting.
  • Added message printer to format numbers with commas to make the stats easier to read for large scale-ups

Testing:

  • Tested in a cluster with NodeClaims.
    • Scaled to 5,000 pods and about 577 nodes and then scaled down
  • Tested in a cluster without NodeClaims
    • System adjusts to not start the nodeclaim controller

DEMO:

eks-node-viewer-node-claims.mov

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@bwagner5 bwagner5 requested a review from a team as a code owner October 27, 2023 19:21
@bwagner5 bwagner5 requested a review from tzneal October 27, 2023 19:21
@@ -261,6 +306,10 @@ func (n *Node) NotReadyTime() time.Time {
}
n.mu.RUnlock()
if !notReadyTransitionTime.IsZero() {
// if there's a nodeclaim creation ts, use it if the node has never been Ready before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is when the node went not ready. If that's non-zero, we should use it instead of the node claim creation time, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem this addresses is when the transient Node (from the NodeClaim) is replaced by the actual Node. The not ready time resets, so this uses the cached NodeClaim creation time to replace the initial not ready transition time so the ticker keeps ticking when the real node registers rather than resetting.

The NodeClaim creation TS is set back to zero when the node goes Ready for the first time so that if the nodes goes Ready and then Unready, the ticker won't use the NodeClaim creation time and will properly use the nodeReadyTransitionTime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still now following :). Assume this situation:

  • Node starts and goes ready
  • 10 minutes later, node goes not ready

The status condition should have a non-zero not ready time, which we should return. That time should be at launch + 10 minutes. Doesn't this change make us return the node claim creation time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's zero'd out in the Ready() func:

	n.mu.RUnlock()
	// when the node goes ready, remove the nodeclaim creation ts, if any
	if ready {
		n.mu.Lock()
		n.nodeclaimCreationTime = time.Time{}
		n.mu.Unlock()
	}
	return ready

Copy link
Contributor

@tzneal tzneal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tzneal tzneal merged commit dade090 into awslabs:main Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants