Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] hostsystem error for cluster provisioned with vsphere provider #10460

Closed
chrpinedo opened this issue Feb 9, 2024 · 15 comments · Fixed by #10473 or #10599
Closed

[BUG] hostsystem error for cluster provisioned with vsphere provider #10460

chrpinedo opened this issue Feb 9, 2024 · 15 comments · Fixed by #10473 or #10599

Comments

@chrpinedo
Copy link

Rancher Server Setup

  • Rancher version: 2.8.2
  • Installation option (Docker install/Helm Chart): Helm
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): RKE2 3-node cluster v1.26.11+rke2r1
  • Proxy/Cert Details: Ingress with private CA

Information about the Cluster

  • Kubernetes version: RKE2 v1.26.11+rke2r1
  • Cluster Type (Local/Downstream): Downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): Rancher provisioned cluster with vsphere provider (provisioned with Rancher v2.7)

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) Admin
    • If custom, define the set of permissions:

Describe the bug

After upgrading to v2.8.2, when I edit my clusters provisioned with Rancher and the vSphere provider, I see the following error for every pool Machine Pool I have in the cluster.

pool1: The provided value for hostsystem was not found in the list of expected values. This can happen with clusters provisioned outside of Rancher or when options for the provider have changed. 

The host field is empty. Because according to the documentation: Specific host to create VM on (leave blank for standalone ESXi or for cluster with DRS). So, I think is a false error.

To Reproduce
Edit one cluster that was previously provisioned with the vsphere provisioner of Rancher and the host field was left blank.

Result

Expected Result
No error

Screenshots
imagen
imagen

Additional context

@mueller-tobias
Copy link

Same Issue here. Did occur after the Upgrade to Rancher 2.8.2
The error also occur when you create a new vsphere cluster via UI under 2.8.2. After the creation you get the same error.

@grnxi
Copy link

grnxi commented Feb 13, 2024

Same issue here, Whilst upgraded to 2.8.1

@momesgin momesgin self-assigned this Feb 14, 2024
@vincebrannon
Copy link

SURE-7622

@kwwii kwwii transferred this issue from rancher/rancher Feb 14, 2024
@kwwii
Copy link
Contributor

kwwii commented Feb 14, 2024

Moved to the dashboard repo for us to look at

@egrosdou01
Copy link

+1
Fresh new installation of Rancher 2.8.2 via a Helm chart.

@gaktive
Copy link
Member

gaktive commented Feb 23, 2024

/backport v2.8.next1

@gaktive
Copy link
Member

gaktive commented Feb 23, 2024

Adding QA/None since a back port to 2.8.x was filed and QA can test that once there.

@gaktive gaktive added the JIRA label Feb 23, 2024
@gaktive
Copy link
Member

gaktive commented Feb 23, 2024

Internal reference: SURE-7622

@adrianehlinger-it
Copy link

adrianehlinger-it commented Mar 7, 2024

Almost the same issue after upgrading to 2.8.2:

The provided value for contentLibrary was not found in the list of expected values. This can happen with clusters provisioned outside of Rancher or when options for the provider have changed.

Only difference: its contentLibrary and not hostsystem for our cluster. I also had to change the vmware credentials and replace the host with its IP address.

@momesgin
Copy link
Member

momesgin commented Mar 7, 2024

Almost the same issue after upgrading to 2.8.2:

The provided value for contentLibrary was not found in the list of expected values. This can happen with clusters provisioned outside of Rancher or when options for the provider have changed.

Only difference: its contentLibrary and not hostsystem for our cluster. I also had to change the vmware credentials and replace the host with its IP address.

@adrianehlinger-it is your cluster configured outside of rancher, like in Terraform? also when you check the Edit Config page, is the Library Template field(next to Content Library field) empty?

@adrianehlinger-it
Copy link

adrianehlinger-it commented Mar 8, 2024

@momesgin

To give you a little bit more context: I actually run into two errors.

The first one is: InvalidBodyContent: Post "https://<hostname>:443/sdk": dial tcp: lookup <hostname> on 10.43.0.10:53: read udp 10.42.2.184:58404->10.43.0.10:53: i/o timeout

I fixed this by replacing the host name of our vsphere to its actual IP address in the corresponding cloud credentials, since it seems to have problems with the DNS.

And the second one is the error message with the content library.

We do not configure the cluster outside of Rancher. Also, we do not have any library templates defined in vsphere and its not opted in for the cluster config as you can see in the screenshot below. The creation method is: Deploying from Template (Data Center).

Bildschirmfoto 2024-03-08 um 12 50 40

Though, if I edit the cluster as yaml file, I can see an entry for every pool that says:
"contentLibrary":{"type":"string","default":{"stringValue":"","intValue":0,"boolValue":false,"stringSliceValue":null},"create":true,"update":true,"description":"If you choose to clone from a content library template specify the name of the library"}

This error only occurred after upgrading to Rancher 2.8.2. I updated the exact same Kubernetes cluster on Rancher 2.7.4 without any problems (just a few hours before upgrading Rancher itself). And there has been no error for upadating the locally hosted kubernetes cluster on which rancher runs.

@momesgin
Copy link
Member

momesgin commented Mar 8, 2024

@adrianehlinger-it re the DNS part, I'm not sure if I can help, but as far as I can tell it's unrelated to the contentLibrary error.

The part mentioned in the YAML file is simply schema information. You should see a reference to your vSphere configuration a few lines down under machineConfigRef:

machineConfigRef:
        kind: VmwarevsphereConfig
        name: nc-<your-cluster-name>-<pool-name>-<id>

You should be able to find the values of your configuration in YAML format from this page:
https://<your-host>/dashboard/c/local/manager/rke-machine-config.cattle.io.vmwarevsphereconfig

My guess would be that contentLibrary is set to null or "" there which leads to show the error.

I'm curious to know what options you see in the Content Library dropdown after selecting 'Deploy from template: Content Library' in the Edit Config page.

@adrianehlinger-it
Copy link

adrianehlinger-it commented Mar 11, 2024

@momesgin

Yes, I agree with you. The DNS error seems to be unrelated.

For the options that I can see, see screenshot below. The field "library template" seems to not stop "loading", because all I can see is the loading symbol. No actual value is being loaded. Which does make sense since we do not have any content libraries in our vSphere.

Bildschirmfoto 2024-03-11 um 10 38 13

Do you have any suggestions what actions to take to fix this error? Or should I simply wait for the Bug Fix?

@momesgin
Copy link
Member

@adrianehlinger-it Thank you for sharing the information. Unfortunately, you'll need to wait for the fix to be implemented.

@aalves08
Copy link
Member

@gaktive @nwmac this issue is tied to SURE-7622... Could that be closed as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment