Skip to content
This repository has been archived by the owner on May 30, 2023. It is now read-only.

Disable Predictable Network Interface Names under VMware #436

Open
wants to merge 1 commit into
base: flatcar-master
Choose a base branch
from

Conversation

tgelter
Copy link

@tgelter tgelter commented Jun 22, 2020

Disable Predictable Network Interface Names under VMware

Predictable Network Interface Names are enabled for VMware instances of Flatcar CL, whereas they are disabled for (at least) AWS/Azure instances. This leads to inconsistency of interface naming between instances across clouds (eth0 for AWS/Azure, ens192 for VMware). This commit brings VMware instances in line with public cloud instances and adds some consistency to comments about this configuration in the EC2 & Rackspace grub config files.

How to use

Build the Flatcar VMware image and boot it on VMware ESXi, note the name of the primary (and probably only) network interface.

Testing done

I am not sure how to build the image at this time, so no testing has been done. If testing is required before a review of this PR, please let me know and I'll look into building the VMware image locally.

@pothos
Copy link
Contributor

pothos commented Jun 23, 2020

I'm not sure this is a good idea. If it's about the main network interface, then it's better to control the naming directly via https://www.freedesktop.org/software/systemd/man/systemd.link.html instead of impacting other interfaces as well (containers etc).

Also, this is a breaking change. At least our testing uses ens192 hardcoded which was the idea of predictable names ;)

@tgelter
Copy link
Author

tgelter commented Jun 23, 2020

Thank you for taking the time to look at this PR.

I'm not sure this is a good idea. If it's about the main network interface, then it's better to control the naming directly via https://www.freedesktop.org/software/systemd/man/systemd.link.html instead of impacting other interfaces as well (containers etc).

I don't believe there's a problem here, here you can see a before/after of interfaces on Flatcar VMs joined to Kubernetes & running containers:
VM without the change proposed in this PR:

$ cat /proc/cmdline 
rootflags=rw mount.usrflags=ro BOOT_IMAGE=/flatcar/vmlinuz-a mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132 rootflags=rw mount.usrflags=ro consoleblank=0 root=LABEL=ROOT console=ttyS0,115200n8 console=tty0 flatcar.first_boot=detected flatcar.randomize_disk_guid=00000000-0000-0000-0000-000000000001 flatcar.oem.id=vmware flatcar.autologin verity.usrhash=775c6092bc6a0c6504a6530347ee748314ba8445a1c1c5be737e07a8c89867b1

$ ls /sys/class/net/
cilium_host  cilium_net  cilium_vxlan  dnsmasq  dummy0  ens192  lo  lxc1355871b822e  lxc527cedb2c2e1  lxc574887c69166  lxc_health  lxca670e91d8c3f  lxcc2679f436924

VM manually booted with the change proposed in this PR:

$ cat /proc/cmdline 
rootflags=rw mount.usrflags=ro BOOT_IMAGE=/flatcar/vmlinuz-a mount.usr=/dev/mapper/usr verity.usr=PARTUUID=7130c94a-213a-4e5a-8e26-6cce9662f132 rootflags=rw mount.usrflags=ro consoleblank=0 root=LABEL=ROOT console=ttyS0,115200n8 console=tty0 flatcar.first_boot=detected flatcar.oem.id=vmware flatcar.autologin net.ifnames=0 verity.usrhash=775c6092bc6a0c6504a6530347ee748314ba8445a1c1c5be737e07a8c89867b1

$ ls /sys/class/net/
cilium_host  cilium_net  cilium_vxlan  dnsmasq  dummy0  eth0  lo  lxc37395750eb50  lxc5b43f0fa672d  lxc6fa65de7c4e0  lxc_health  lxcbf368e168df6  lxcfd31de25437f

I can check to see if the change can be made via Ignition early enough in the boot process, but I believe this is a useful change generally to bring VMware instance naming in line with cloud provider instance naming.

Also, this is a breaking change. At least our testing uses ens192 hardcoded which was the idea of predictable names ;)

The problem is that when we launch Flatcar instances in multiple clouds, we end up with inconsistency in the primary interface naming:

  • AWS: eth0
  • Azure: eth0
  • VMware: ens192

Also, predictable device naming was designed to solve a problem with physical hardware changing interface naming as hardware is being added/removed from servers. It doesn't buy us much w/ virtual machines, which is why it is not being used for AWS/Azure instances.

@pothos
Copy link
Contributor

pothos commented Jun 23, 2020

Besides breaking existing configurations I think it's also impacting the functionality of renaming with networkd but I would need to check. Good that this isn't affecting K8s container interfaces in your case but there are also other cases to consider.
You are right that for many cloud providers there are just single virtual NICs but for VMware this is not really the case, or? I haven't tried it myself but NIC passthrough is possible etc.

In the mean time I suggest that you add a networkd config file in your Ignition config that renames the ens192 interface to eth0.

@tgelter
Copy link
Author

tgelter commented Jun 23, 2020

@pothos, thanks for the insight you've shared. I'd like to keep this issue open for consideration since I still see value in unifying the interface naming between cloud providers.
That said, I was able to work around this issue in our own provisioning code base by using the following Ignition configuration, in case it's useful to anyone else that stumbles upon this PR:

networkd:
  units:
    # Disable Predictable Network Interface Names for VMXNET3 network adapters
    # $ sudo ls -la /sys/bus/pci/devices/*$(sudo lspci | awk '/VMXNET3/ {print $1}')
    # lrwxrwxrwx. 1 root root 0 Jun 23 21:04 /sys/bus/pci/devices/0000:0b:00.0 -> ../../../devices/pci0000:00/0000:00:16.0/0000:0b:00.0
    - name: 00-eth0.link
      contents: |
        [Match]
        Path=pci-0000:0b:00.0
        [Link]
        Name=eth0

Note that this would break if we changed to a different type of network adapter.
Thanks again!

@pothos
Copy link
Contributor

pothos commented Jun 24, 2020

Yes, let's keep this open even if the solution would be to enable predictable names on the other platforms but have all virtual interfaces called eth0 or whatever.
Thanks for pasting this here.

@tgelter
Copy link
Author

tgelter commented Jun 25, 2020

@pothos, is there any other information I can provide to help the conversation along?

@pothos
Copy link
Contributor

pothos commented Jun 26, 2020

Currently GRUB is used to write this setting in AWS etc which is not so nice because it's not part of the A/B partition update mechanism.
I would like to replace these GRUB parameters with networkd settings like this here https://github.com/flatcar-linux/init/blob/flatcar-master/systemd/network/98-virtio.link (or a similar rule). The yy-azure-sriov.network file shows how to match for an OEM ID.
This allows us to change the setting easily without worrying that older systems are stuck on a setting.
The decision on renaming the default VMware interface is still outstanding and I didn't investigate it deeper yet.

That said, I think it's not a good idea to rely on the presence of a eth0 interface. On Packet for example there are bonded interfaces, others may want to run wireguard etc. I recommend you to have a reliable way of finding the correct interface based on the route. We also have tooling to get the cloud providers metadata through the coreos-metadata service that uses afterburn.

@pothos
Copy link
Contributor

pothos commented Mar 9, 2022

Seeing this PR again and my last comment: the linked 98-virtio.link unit is a good starting point to set the VMware and AWS interfaces to follow NamePolicy=kernel database onboard while having AlternativeNamesPolicy=database onboard slot path for the predictable name. This way it's not a breaking change (we have added AlternativeNamesPolicy at many places some time ago after a renaming happened in GCE and it worked well even to specify a hardcoded AlternativeName=... in addition).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants