Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMware VM Fails to load with Terraform #272

Closed
gyalowitzdtlr opened this issue Nov 17, 2020 · 10 comments
Closed

VMware VM Fails to load with Terraform #272

gyalowitzdtlr opened this issue Nov 17, 2020 · 10 comments

Comments

@gyalowitzdtlr
Copy link

Since the update our terraform kubernetes spinups have all been failing to load. They get stuck with that error start job /dev/dis error. What is strange is that this issue just cropped up since they were working without issue before. Could this all be caused by the issue with the guest OS that have cropped up?

@pothos
Copy link
Member

pothos commented Nov 18, 2020

I can't really follow what your setup looks like, but are you deploying new instances and they are stuck in the first boot? This can be an Ignition config issue if it's during the initramfs. Do you rely on fetching remote files in Ignition? This only works for DHCP, not static IP addresses.
Are you using the latest Stable images?

By the way, currently the VMware tools on the OEM partition are not updated and in case you rely on them, you have to copy them over. Tthere is another issue about that were a workaround was posted: #21 (comment)

@gyalowitzdtlr
Copy link
Author

gyalowitzdtlr commented Nov 21, 2020

Thank you for the response. It is actually starting to look like it could be a few things. I moved the flatcar image we are using to alpha and now it seems like the ignition userdata is not being used at all. The ignition file is base64 encoded and supplied through terraform ignition config. SO I am basically trying to figure out if its:

  1. Vmware tools
  2. L1TF and MDS mitigations we have in place in Vsphere.
  3. Terraform provider is now broken.

Reading through these issues though it does seem like the L1TF and MDS kernal lockdowns more or less break guestinfo. Basically the terraform modules that spun up this piece of infrastructure have been completely at rest and worked without issue. The only major change was upgrading VMware to 7.0.1 and implement SCAV2.

@gyalowitzdtlr
Copy link
Author

gyalowitzdtlr commented Nov 21, 2020

image

image

image

image

image
image
image

image
image
image
image
image

image

image

image

@pothos
Copy link
Member

pothos commented Nov 23, 2020

Thanks for the infos. Do you also have ignition.config.data.encoding set to base64? Do you need to pass the guestinfo. prefix or not in Terraform? Can you try out please if /usr/share/oem/bin/vmware-rpctool 'info-get guestinfo.ignition.config.data' works?

@gyalowitzdtlr
Copy link
Author

The info-get result:
image

Encoding being passed in:
image

How ignition is being passed in:
image

Basically the ignition provider creates a rendered ignition file into json. We then base64 encode it and pass it into the vmware we are spinning up as a vapp property that you see in the bottom picture. Again, what is absolutely confusing is that this was working up until recently. Is it possible that the intel vulnerability mitigations could be at the root of this?

@gyalowitzdtlr
Copy link
Author

Here is a link to the KB article that I followed.
https://kb.vmware.com/s/article/55806

@gyalowitzdtlr
Copy link
Author

Also, looking at my ignition config it looks like it is being generated using version 2.1.0 and I wonder if the new flatcar images or vmware upgrades is now no longer compatible with that version. Would you know if it is?

@gyalowitzdtlr
Copy link
Author

Also, using the following command:
image

You can see the base64 encoded guestinfo variable being read:
image

@gyalowitzdtlr
Copy link
Author

Good news is that I managed to get ignition to load again. The terraform provider for ignition is pulling from the fork that only does 3.x spec which is a problem for Flatcar, at least from what I read. I also had to go in and seriously tweak the versions of providers that terraform is pulling to make sure it's the correct version and uniform across all modules. Now the only issue is the vmwaretools service failing.

@pothos
Copy link
Member

pothos commented Nov 25, 2020

For vmtoolsd make sure it is updated: #21 (comment)

@pothos pothos closed this as completed Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants