[EKS] [request]: Simplify CNI custom networking #867

mikestef9 · 2020-04-28T18:47:53Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Simplify and remove certain steps required to use custom networking with VPC CNI plugin.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Custom networking is a feature that allows you to run pods using separate subnets and security groups from worker nodes, however multiple setup steps are required:

Setting up secondary VPC CIDR blocks can be time consuming, and requires a long string of EC2 API calls. There should be an automated command to set this up.
~~Max pods must be manually calculated and passed to kubelet for worker nodes. This should be automated, and will also allow custom networking to work with Managed Node Groups.~~
- Managed node groups now automatically calculates recommended max pods based on VPC CNI settings.
ENIConfigs must be created for each availability zone. There should be an option to auto discover these subnets based on tags.

Many of these steps should be simplified and/or automated.

Additionally, documentation is limited. Should add more content like this to EKS docs.

stevehipwell · 2020-10-19T10:48:17Z

ENIConfigs must be created for each availability zone. There should be an option to auto discover these subnets based on tags.

Is there any news on the above point? In a real world cluster with both public and private subnets an ENIConfig per AZ isn't enough, one is needed per subnet. To do this currently you need to use a dynamic label to use multi AZ ASGs.

mikestef9 · 2020-10-19T19:46:34Z

Want to get some feedback on what we are thinking here.

For subnets where you want pods to run, tag the subnet with key vpc-cni:pod-subnet and valued shared. Set AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG to True.

The VPC CNI plugin will periodically make a DescribeSubnets API call, and filter by the VPC ID of the cluster, as well as by subnets having tag key vpc-cni:pod-subnet. The plugin will loop through each subnet returned, and a create/update a map of availability zone to subnets.

When a new node is launched, and AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG is set to True, behavior will initially remain the same, with CNI plugin looking for an ENIConfig. If found it will use that configuration to provision additional ENIs on the node.

If no ENIConfig is found, CNI plugin will query the map from the previous step, and lookup all the subnets based on the availability zone of the worker node.

The subnet field in ENIConfig will be made optional. If you are ok with having security groups copied from the primary ENI to secondary ENIs, then ENIConfig is no longer required at all in this proposal. But if you do care about different security groups as well, you can still specify them in the ENIConfig and use a node label or annotation to point to that ENIConfig like today. The upside with this is there is no AZ dependency, and there can be a single ENIConfig that could potentially be used for all nodes if only security groups need to specified. Further, security groups for pods also works with custom networking, so you can leverage that feature to specify even more fine grained security groups if needed.

Open Questions:
How to pick which subnet to use if multiple are found per AZ of the worker node? Some initial ideas below

Random
Look at availableIpAddressCount of each subnet and for each node, choose the one with the most free IPs.
Have the value of the tag on the subnet be an integer instead of shared, and use that value as priority sorting mechanism.
Something else?

Please let us know any feedback on this idea, or feel free to suggest any other ideas you feel would help simplify your workflow using custom networking today.

stevehipwell · 2020-10-20T09:00:37Z

@mikestef9 here's a couple of feedback points after implementing this with the current options.

I'm interested in why the current docs and future plans are AZ based instead of subnet based, which would match the reference architecture? Our requirements involve linking a separate secondary subnet to our public and private subnets. Currently we need to dynamically label our nodes (with the node primary subnet) to achieve this, but it would work better if this could be achieved via subnet tags without us having to add any node specific logic. An extension of the above pattern to use both the vpc-cni:pod-subnet=shared tag to enable the logic and vpc-cni:pod-subnet-for-worker-subnet=subnet-xxxxx to link a secondary subnet to the worker's primary subnet.

I'm also interested if it would be possible to have custom networking enabled but only for the nodes with the label set; or not lose the primary ENI if the custom networking refers back to the node primary subnet.

Finally it would be good if the max pods value could be set dynamically as the required inputs for the calculation are present here.

jwenz723 · 2021-02-09T18:25:54Z

I like the option:

Have the value of the tag on the subnet be an integer instead of shared, and use that value as priority sorting mechanism.

This seems to be the most flexible.

This option could be integrated with either the Random option or the Look at availableIpAddressCount of each subnet and for each node, choose the one with the most free IPs. option by saying that if more than 1 subnet have the same numerical rank (i.e. if subnet-A and subnet-B both have the value vpc-cni:pod-subnet=1) then the secondary strategy will be performed (Random or availableIpAddressCount).

yoanisgil · 2021-05-21T17:52:24Z

Are you also considering adding a feature flag that enables toggling on/off CNI custom networking? This is the trickiest thing to do in the project I'm working on, as the EKS module from terraform has no way of exposing such functionality (because there is no exposure to the configuration of the aws-node daemonset from EKS).

1mamute · 2021-08-18T13:56:59Z

This is an absolute must. Configuring VPC CNI is challenging and introduces a lot of overhead to operators.
If you deployed the VPC CNI via EKS add-on and tweak a setting, you need to patch the aws-node and restart the deamonset manually.

cazlo · 2022-12-04T00:07:21Z

A lot of this complexity is encapsulated in https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf

Summary of the "extra complexity" AWS end-users must manage when applying custom networking to worker nodes:

https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf#L19-L25
- machine running tf must have kubectl installed
- kubectl version is specific to EKS version (major and minor version, e.g. for k8s 1.24 use the 1.24.7 binary available at https://s3.us-west-2.amazonaws.com/amazon-eks/1.24.7/2022-10-31/bin/linux/amd64/kubectl)
- kubectl provider must be pulled on any tf run (or manage caching this)
https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf#L84-L96
- must setup correct "max pod count" for AMI startup/eks bootstrap script run
must modify VPC CNI to use custom networking before putting addons in the cluster, else they can fail to start correctly
- https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf#L122-L125
- https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf#L168-L190
https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf#L196-L213
- ENI config must be applied to all subnets the EC2 can launch in

This "extra complexity" negatively impacts the reliability of systems utilizing custom networking. For example, the continued functionality of pods and scaling behavior depends on the user-managed ENIConfig resources to be available and correctly configured. If we play the "chaos engineering" role for a minute and take away 1 ENIConfig, it will totally break the network functionality of new nodes spun up.

Additionally, custom networking is not supported on Windows worker nodes. However, it requires "cluster level" configuration changes to VPC CNI (setting env variables in the aws-node daemonset). This seemingly precludes "safe" custom networking use for mixed-OS clusters workload use cases.

As custom networking provides advantages with regard to:

pod startup time (through use of prefix delegation)
enhanced security controls (isolation of running pods network)
ipv4 capacity planning in situations with limited ip space available

The aws-ia examples and documentation available at https://aws.github.io/aws-eks-best-practices/networking/index/ have greatly helped with this process, however there is still much complexity to manage to use custom networking.

To make this process easier for future devs I would love to see the following:

"1 click" button for enabling custom networking. User is responsible with inputting configuration data such as what subnets, node groups to apply this to, and AWS manages the rest.
EKS bootstrap script automatically detect custom networking and sets up node appropriately without user input
Windows support
Merge the content of the open source, AWS provided "EKS Best Practices" documentation with the EKS user documentation

stevehipwell · 2022-12-06T11:59:47Z

@cazlo not that this answers your main concerns but it might help you out. You shouldn't need the kubectl binary on your machine to use the kubectl Terraform provider, you'd only need it if you have to run arbitrary commands via either a provisioner or a shell resource.

I think IP prefix mode should be the default behaviour for the VPC CNI which would solve a lot of the configuration issues out of the box. Custom networking could also be defaulted to using a tagging strategy like I suggested above. Then if the node ENI IPs are no longer a constraint node bootstrap shouldn't care about the networking specifics; which is useful as by definition bootstrap can't see the K8s components until it's configured and connected.

On your other points (I'm completely ignoring Windows here) you are constrained by the terraform-aws-eks module, which underpins the blueprints module, and the EKS managed addons. If you're not using managed node groups you should be able to get around most of this by using self-managed addons.

autarchprinceps · 2023-01-11T13:57:32Z

Setting ENI_CONFIG_LABEL_DEF via EKS addon isn't even supported, only AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG.
I'm not sure under what circumstances an addon throws away manual changes on envs, a simple version update thankfully doesn't seem to, but if you offer the option to set envs via addon, you should at least support all those featured in official docs.aws.amazon.com userguides.

jayasuryakumar-dh · 2023-10-05T15:10:02Z

I am following this document https://repost.aws/knowledge-center/eks-custom-subnet-for-pod to use IPs from ENIConfig subnet than node subnet for the pods.

Below are my specs:
EKS cluster version: 1.24
amazon-k8s-cni-init:v1.15.0-eksbuild.2
amazon-k8s-cni:v1.15.0-eksbuild.2

I follow the same steps in the document.

Set the env variable kubectl set env daemonset aws-node -n kube-system AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true. aws-node pods are restarted, running.
Created ENIConfig objects with same name as AZ(eu-west-1a, eu-west-1b, eu-west-1c) and without security groups.
I wanted to automatically label new nodes with ENIConfig object.

But when I set the following env variable kubectl set env daemonset aws-node -n kube-system ENI_CONFIG_LABEL_DEF=topology.kubernetes.io/zone the aws-node pods are restarted and failing with following error.

Warning  Unhealthy  23s   kubelet            Readiness probe failed: {"level":"info","ts":"2023-10-05T13:40:15.581Z","caller":"/root/sdk/go1.20.4/src/runtime/proc.go:250","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning  Unhealthy  13s   kubelet            Readiness probe failed: {"level":"info","ts":"2023-10-05T13:40:25.587Z","caller":"/root/sdk/go1.20.4/src/runtime/proc.go:250","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning  Unhealthy  3s    kubelet            Readiness probe failed: {"level":"info","ts":"2023-10-05T13:40:35.605Z","caller":"/root/sdk/go1.20.4/src/runtime/proc.go:250","msg":"timeout: failed to connect service \":50051\" within 5s"}

Can you please share some information? Am I missing something or any extra configuration is needed?

sjastis · 2024-04-05T00:19:10Z

Appreciate your feedback on how we can simplify default experience for IP address management in VPC CNI. Starting with VPC CNI v1.18, we support automatic subnet discovery and dynamic address allocation based on IP address utilization across available subnets. To learn more, here is blog post : https://aws.amazon.com/blogs/containers/amazon-vpc-cni-introduces-enhanced-subnet-discovery/

For use cases that do not require running pods on a different subnet and using separate security groups, we believe the new feature ( also enabled by default) provides a more simpler experience. Check it out and let us know how we can improve the default experience further.

mikestef9 added EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue labels Apr 28, 2020

mikestef9 self-assigned this Apr 29, 2020

mikestef9 mentioned this issue Jul 23, 2020

[EKS]: Next Generation AWS VPC CNI Plugin #398

Closed

janwillies mentioned this issue Jul 24, 2020

add support for custom subnets in EKS (secondary cidr ranges) crossplane-contrib/provider-aws#163

Closed

mogren mentioned this issue Sep 15, 2020

avoid 6 api rpm per pod by removing operator framework cache-reset aws/amazon-vpc-cni-k8s#1214

Closed

jayanthvn mentioned this issue Apr 25, 2021

Custom Networking Feature Requires Node Reboot aws/amazon-vpc-cni-k8s#1403

Closed

jwenz723 mentioned this issue Aug 23, 2021

[EKS] [request]: allow to change configuration on vpc-cni addon #1333

Closed

sjastis added the EKS Networking EKS Networking related issues label Aug 26, 2023

jdn5126 mentioned this issue Jan 25, 2024

Custom network ENI fails silently due to lack of detailed spec defination for ENIConfig CRD aws/amazon-vpc-cni-k8s#2416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EKS] [request]: Simplify CNI custom networking #867

[EKS] [request]: Simplify CNI custom networking #867

mikestef9 commented Apr 28, 2020 •

edited

Loading

stevehipwell commented Oct 19, 2020

mikestef9 commented Oct 19, 2020

stevehipwell commented Oct 20, 2020

jwenz723 commented Feb 9, 2021 •

edited

Loading

yoanisgil commented May 21, 2021

1mamute commented Aug 18, 2021

cazlo commented Dec 4, 2022 •

edited

Loading

stevehipwell commented Dec 6, 2022

autarchprinceps commented Jan 11, 2023

jayasuryakumar-dh commented Oct 5, 2023

sjastis commented Apr 5, 2024 •

edited

Loading

[EKS] [request]: Simplify CNI custom networking #867

[EKS] [request]: Simplify CNI custom networking #867

Comments

mikestef9 commented Apr 28, 2020 • edited Loading

Community Note

stevehipwell commented Oct 19, 2020

mikestef9 commented Oct 19, 2020

stevehipwell commented Oct 20, 2020

jwenz723 commented Feb 9, 2021 • edited Loading

yoanisgil commented May 21, 2021

1mamute commented Aug 18, 2021

cazlo commented Dec 4, 2022 • edited Loading

stevehipwell commented Dec 6, 2022

autarchprinceps commented Jan 11, 2023

jayasuryakumar-dh commented Oct 5, 2023

sjastis commented Apr 5, 2024 • edited Loading

mikestef9 commented Apr 28, 2020 •

edited

Loading

jwenz723 commented Feb 9, 2021 •

edited

Loading

cazlo commented Dec 4, 2022 •

edited

Loading

sjastis commented Apr 5, 2024 •

edited

Loading