Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make 80-ec2.network match primary eni only #1738

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

M00nF1sh
Copy link
Member

@M00nF1sh M00nF1sh commented Mar 28, 2024

Issue #, if available:
N/A

Description of changes:
Override the default 80-ec2.network(generated by amazon-ec2-net-utils) with drop-in config to make systemd.network manage primary ENI only.

After this change, the sequence is as follows:

  1. during initial boot, the 80-ec2.network by amazon-ec2-net-utils will match all ENIs(only primary ENI at this moment), thus systemd.network will configure them properly with DHCP.

  2. the nodeadm-run unit will execute, which will generate a 10-eks_primary_eni_only.conf in administrative drop-in folder /etc/systemd/network/80-ec2.network.d that matches the MAC of primary ENI(obtained from ec2-metadata) only and reload the config.

  3. after kubelet init and vpc-cni runs, vpc-cni will launch new secondary ENIs as pod scheduled, and those secondary ENIs will be setup by CNI and systemd.network treats them as unmanaged.

This is a temporary fix for EKS AL2023 AMI, where the 80-ec2.network generated by amazon-ec2-net-utils will cause systemd.network managing all ENIs on host, and caused multiple issues including:

  1. systemd.network races against vpc-cni to configure secondary enis and might cause routing rules/routes from vpc-cni been flushed.
  2. routes for those secondary enis obtained from dhcp will appear in main route table, which is a drift from our AL2 behavior.

To address this issue temporarily, we override the 80-ec2.network after boot to make it match against primary ENI only.

  • the top-level mac from ec2 metadata is guaranteed to primary eni's mac and guaranteed to be available at boot according to this code comment and internal code links.

TODOs after this PR: there are limitations on current solutions as well, and we should figure long term solution for this:

  1. the altNames for ENIs(a new feature in AL2023) were setup by amazon-ec2-net-utils via udev rules, but it's disabled by eks.

Alternative solutions consider:

  1. we considered to use a static file override to make 80-ec2.network match IFINDEX: 2. However, after communications with ec2 networking, the primary ENI may not always be IFINDEX: 2.
  2. completely remove amazon-ec2-net-utils and simply have the default 80-ec2.network into /usr/lib/systemd/network/80-ec2.network for initial setup. This should work as we have almost disabled all functionality of amazon-ec2-net-utils except this file. However, to keep things simple, i kept it as it is for now, and we can work on a long term solution later. (e.g. make changes to amazon-ec2-net-utils to let us optionally disable secondary eni management).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done
Build an AMI in us-west-2: ami-0caada624a580a108 and launched a node group with it.

After boot:

  1. the primary ENI are configured correctly.
sudo networkctl list
IDX LINK  TYPE     OPERATIONAL SETUP
  1 lo    loopback carrier     unmanaged
  2 ens32 ether    routable    configured
  1. the /etc/systemd/network/80-ec2.network contains correct content:
[Match]
Driver=ena ixgbevf vif
PermanentMACAddress=0e:3f:17:b2:ab:85
....
  1. logs from journalctl -u nodeadm-run
Mar 27 23:59:33 ip-192-168-80-181.us-west-2.compute.internal nodeadm[14386]: {"level":"info","ts":1711583973.9371097,"caller":"init/init.go:115","msg":"Setting up system aspect..","name":"networking"}
Mar 27 23:59:33 ip-192-168-80-181.us-west-2.compute.internal nodeadm[14386]: {"level":"info","ts":1711583973.9371312,"caller":"system/networking.go:77","msg":"writing eks network configuration"}
Mar 27 23:59:33 ip-192-168-80-181.us-west-2.compute.internal nodeadm[14386]: {"level":"info","ts":1711583973.9455516,"caller":"init/init.go:119","msg":"Set up system aspect","name":"networking"}
  1. logs from journalctl -u systemd-networkd
Mar 27 23:59:29 localhost systemd[1]: Starting systemd-networkd.service - Network Configuration...
Mar 27 23:59:29 localhost systemd-networkd[13987]: lo: Link UP
Mar 27 23:59:29 localhost systemd-networkd[13987]: lo: Gained carrier
Mar 27 23:59:29 localhost systemd-networkd[13987]: Enumeration completed
Mar 27 23:59:29 localhost systemd[1]: Started systemd-networkd.service - Network Configuration.
Mar 27 23:59:29 localhost systemd-networkd[13987]: ens32: Configuring with /usr/lib/systemd/network/80-ec2.network.
Mar 27 23:59:29 localhost systemd-networkd[13987]: ens32: Link UP
Mar 27 23:59:29 localhost systemd-networkd[13987]: ens32: Gained carrier
Mar 27 23:59:29 localhost systemd-networkd[13987]: ens32: DHCPv4 address 192.168.80.181/19, gateway 192.168.64.1 acquired from 192.168.64.1
Mar 27 23:59:30 localhost systemd-networkd[13987]: ens32: Gained IPv6LL
Mar 27 23:59:33 ip-192-168-80-181.us-west-2.compute.internal systemd-networkd[13987]: ens32: Reconfiguring with /etc/systemd/network/80-ec2.network.
Mar 27 23:59:33 ip-192-168-80-181.us-west-2.compute.internal systemd-networkd[13987]: ens32: DHCP lease lost
Mar 27 23:59:34 ip-192-168-80-181.us-west-2.compute.internal systemd-networkd[13987]: ens32: DHCPv6 lease lost
Mar 27 23:59:34 ip-192-168-80-181.us-west-2.compute.internal systemd-networkd[13987]: ens32: DHCPv4 address 192.168.80.181/19, gateway 192.168.64.1 acquired from 192.168.64.1

After launching enough pods to node:

  1. output from sudo networkctl list
IDX LINK           TYPE     OPERATIONAL SETUP
  1 lo             loopback carrier     unmanaged
  2 ens32          ether    routable    configured
  3 eniad135498da8 ether    degraded    unmanaged
  .........other veth pairs ...........
 27 eni753930387e9 ether    degraded    unmanaged
 28 ens33          ether    routable    unmanaged
  1. output from ip route show table main
default via 192.168.64.1 dev ens32 proto dhcp src 192.168.80.181 metric 1024
192.168.0.2 via 192.168.64.1 dev ens32 proto dhcp src 192.168.80.181 metric 1024
192.168.64.0/19 dev ens32 proto kernel scope link src 192.168.80.181 metric 1024
192.168.64.1 dev ens32 proto dhcp scope link src 192.168.80.181 metric 1024
192.168.64.83 dev eniad11af0c6ea scope link
192.168.64.119 dev eni0240836f91e scope link
192.168.64.125 dev eni7058b07fc9b scope link
................
  1. output from ip route show table 2
ip route show table 2
default via 192.168.64.1 dev ens33
192.168.64.1 dev ens33 scope link

After reboot

  1. output from journalctl -u systemd-networkd
Mar 28 00:54:39 ip-192-168-80-181.us-west-2.compute.internal nodeadm[14691]: {"level":"info","ts":1711587279.0701544,"caller":"init/init.go:115","msg":"Setting up system aspect..","name":"networking"}
Mar 28 00:54:39 ip-192-168-80-181.us-west-2.compute.internal nodeadm[14691]: {"level":"info","ts":1711587279.0701854,"caller":"system/networking.go:69","msg":"eks network configuration already exists, skipping configuration"}
Mar 28 00:54:39 ip-192-168-80-181.us-west-2.compute.internal nodeadm[14691]: {"level":"info","ts":1711587279.0701997,"caller":"init/init.go:119","msg":"Set up system aspect","name":"networking"}
  1. output from sudo networkctl list
[ec2-user@ip-192-168-80-181 ~]$ sudo networkctl list
IDX LINK           TYPE     OPERATIONAL SETUP
  1 lo             loopback carrier     unmanaged
  2 ens32          ether    routable    configured
  3 ens33          ether    routable    unmanaged
  4 eni22521b8ebbc ether    degraded    unmanaged
  5 enic0d6e878b13 ether    degraded    unmanaged
  6 enif535699d665 ether    degraded    unmanaged
....

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

@M00nF1sh M00nF1sh changed the title make 80-ec2.network match primary eni only [WIP] make 80-ec2.network match primary eni only Mar 28, 2024
@M00nF1sh
Copy link
Member Author

/ci
+workflow:os_distros al2023

@M00nF1sh M00nF1sh changed the title [WIP] make 80-ec2.network match primary eni only make 80-ec2.network match primary eni only Mar 28, 2024
@Issacwww
Copy link
Member

/ci
+workflow:os_distros al2023

Copy link
Contributor

@Issacwww roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@Issacwww the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.23 / al2023success ✅success ✅
1.24 / al2023success ✅success ✅
1.25 / al2023success ✅success ✅
1.26 / al2023success ✅success ✅
1.27 / al2023success ✅success ✅
1.28 / al2023success ✅success ✅
1.29 / al2023success ✅success ✅

@@ -0,0 +1,24 @@
[Match]
Driver=ena ixgbevf vif
PermanentMACAddress={{.PermanentMACAddress}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want the snapshot of the entire config to add PermanentMACAddress, or does it makes more sense to use a 80-ec2.network.d/*.conf drop-in?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, ideally we just patch in the MAC instead of maintaining a copy of the full network unit here

Copy link
Member Author

@M00nF1sh M00nF1sh Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndbaker1 @cartermckinnon
the drop-in might be better as it will keep the original ec2's settings(even if they update this file in the future)
my original thought were to use the existence check on the network file to have the ability to override this behavior with a custom file. But cx can also override this anyway with a higher priority file names as well.
let me do the change to do the drop-in instead

@M00nF1sh
Copy link
Member Author

/ci
+workflow:os_distros al2023

Copy link
Contributor

@M00nF1sh roger that! I've dispatched a workflow. 👍

@M00nF1sh M00nF1sh changed the title make 80-ec2.network match primary eni only [DONT MERGE YET] make 80-ec2.network match primary eni only Mar 28, 2024
@M00nF1sh
Copy link
Member Author

don't merge this yet,having some issues with drop-in, debugging

@M00nF1sh M00nF1sh marked this pull request as draft March 28, 2024 21:55
@M00nF1sh
Copy link
Member Author

fixed :D it's due to the util.WriteFileWithDir simply create directory with same perm of file, and drop-in directories requires 'x' in order to list files.
retesting

@M00nF1sh
Copy link
Member Author

/ci
+workflow:os_distros al2023

Copy link
Contributor

@M00nF1sh roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@M00nF1sh the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.23 / al2023success ✅success ✅
1.24 / al2023success ✅success ✅
1.25 / al2023success ✅success ✅
1.26 / al2023success ✅success ✅
1.27 / al2023success ✅success ✅
1.28 / al2023success ✅success ✅
1.29 / al2023success ✅success ✅

@M00nF1sh M00nF1sh changed the title [DONT MERGE YET] make 80-ec2.network match primary eni only make 80-ec2.network match primary eni only Mar 28, 2024
@M00nF1sh M00nF1sh marked this pull request as ready for review March 28, 2024 22:50
@M00nF1sh
Copy link
Member Author

/ci
+workflow:os_distros al2023

Copy link
Contributor

@M00nF1sh roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@M00nF1sh the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.23 / al2023success ✅success ✅
1.24 / al2023success ✅success ✅
1.25 / al2023success ✅success ✅
1.26 / al2023success ✅success ✅
1.27 / al2023success ✅success ✅
1.28 / al2023success ✅success ✅
1.29 / al2023success ✅success ✅

Copy link
Member

@cartermckinnon cartermckinnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@cartermckinnon cartermckinnon merged commit d0acbc4 into awslabs:main Mar 28, 2024
10 checks passed
Copy link
Contributor

@M00nF1sh the workflow that you requested has completed. 🎉

AMI variantBuildTest
1.23 / al2023success ✅success ✅
1.24 / al2023success ✅success ✅
1.25 / al2023success ✅success ✅
1.26 / al2023success ✅success ✅
1.27 / al2023success ✅success ✅
1.28 / al2023success ✅success ✅
1.29 / al2023success ✅success ✅

@rajeevpnair
Copy link

I am facing the same issue in 1.30, is this fix added to 1.30 ?

@cartermckinnon
Copy link
Member

@rajeevpnair yep, if you’re still seeing this on AL2023, it’s a different issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants