Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node webhook combined with karpenter causes new nodes to fail #642

Closed
atamgp opened this issue Sep 19, 2022 · 3 comments
Closed

node webhook combined with karpenter causes new nodes to fail #642

atamgp opened this issue Sep 19, 2022 · 3 comments
Labels
duplicate This issue or pull request already exists

Comments

@atamgp
Copy link

atamgp commented Sep 19, 2022

Bug description

We combine karpenter auto-scaler with capsule. The issue is that on a new cluster, at start-up, it's quit normal that when capsule is deployed, there are no available nodes to schedule.
This causes an issue because the node level webhook is registered immediately while capsule pods wait for the scheduler.
When karpenter starts a new node (and all following) the new node tries to call the webhook which off-course fails.

How to reproduce

In a new cluster with just karpenter deployed on managed-nodes (AWS), so no kapenter provisioned nodes yet, deploy capsule.
This is register its webhook en trigger karpenter to star a new node causing above issue.

Expected behavior

Normal startup of nodes

Logs

scripts git:(main) ✗ k -n kube-system logs aws-node-l5gnt aws:difu-infrastructure-testing-tst
{"level":"info","ts":"2022-09-14T12:32:36.635Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-09-14T12:32:36.636Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-09-14T12:32:36.650Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-09-14T12:32:36.655Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2022-09-14T12:32:38.667Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:40.673Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:42.679Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:44.686Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:46.692Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:48.698Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:50.704Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:52.711Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:54.718Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:56.725Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:32:58.731Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:33:00.737Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:33:02.744Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:33:04.750Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:33:06.756Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:33:08.763Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:33:10.769Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:33:12.775Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:33:14.781Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-09-14T12:33:16.787Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}

Additional context

  • Capsule version: 0.1.2

Suggested solution

Make the node level webhook optional in the helm chart

@atamgp atamgp added blocked-needs-validation Issue need triage and validation bug Something isn't working labels Sep 19, 2022
@prometherion
Copy link
Member

Thanks for opening an issue, @atamgp!

I suspect this could be a duplicate of this: #597 (comment): may I ask you to install Capsule disabling the nodes webhook failurePolicy?

@atamgp
Copy link
Author

atamgp commented Sep 20, 2022

The failurePolicy worked, thank you!

@atamgp atamgp closed this as completed Sep 20, 2022
@prometherion
Copy link
Member

Glad to hear that! 🚀

For any issue, please, don't hesitate to ping us here on GitHub or on the Slack channel. 👍🏻

@prometherion prometherion added duplicate This issue or pull request already exists and removed bug Something isn't working blocked-needs-validation Issue need triage and validation labels Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants