Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using externalgrpc provider, nodes are created and then immediately deleted next loop iteration #5935

Closed
PeterGrace opened this issue Jul 6, 2023 · 4 comments · Fixed by #5936
Assignees
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@PeterGrace
Copy link

PeterGrace commented Jul 6, 2023

Which component are you using?:
Cluster-Autoscaler externalgrpc provider

What version of the component are you using?:
registry.k8s.io/autoscaling/cluster-autoscaler:v1.27.2

Component version:
helm chart version 9.29.x (edited to allow externalgrpc to consume cloudConfig path)

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:13:27Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.5+k3s1", GitCommit:"7cefebeaac7dbdd0bfec131ea7a43a45cb125354", GitTreeState:"clean", BuildDate:"2023-05-27T00:05:40Z", GoVersion:"go1.19.9", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:
Lab environment

What did you expect to happen?:
I'm working on a libvirt module for the externalgrpc provider. After creating a node via the externalgrpc provisioner, at the next iteration loop the node is then deleted by the autoscaler saying that Scale-Up has timed out.

What happened instead?:
I expected the node to be given enough time to provision and register to the kubernetes master (60-90 seconds)

How to reproduce it (as minimally and precisely as possible):
Write a grpc server that creates nodes and then observe the Scale-Up timeout occurring at the next iteration.

Anything else we need to know?:
I spent a lot of time in the Kubernetes #sig-autoscaling slack channel discussing this with @vadasambar . They indicated that this appears to be an issue in the protobuf definition for externalgrpc, as PR #5649 had moved MaxNodeProvisionTime into NodeGroupAutoscalingOptions, but the externalgrpc/protobuf code was not updated to match this change. They mentioned they will likely be submitting a PR to fix this issue within the next day or so.

@PeterGrace PeterGrace added the kind/bug Categorizes issue or PR as related to a bug. label Jul 6, 2023
@vadasambar
Copy link
Member

/assign vadasambar

@vadasambar
Copy link
Member

vadasambar commented Jul 6, 2023

WIP PR: #5936. If you are interested you can give it a try (you might have to re-build CA image which you can do using make dev-release).

@vadasambar
Copy link
Member

I spent a lot of time in the Kubernetes #sig-autoscaling slack channel discussing this with @vadasambar .

Link to the slack thread: https://kubernetes.slack.com/archives/C09R1LV8S/p1688649364267339

@vadasambar
Copy link
Member

vadasambar commented Jul 7, 2023

PR is ready for review. Waiting for reviews (#5936).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
3 participants