Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [request]: Simplify CNI custom networking #867

Open
mikestef9 opened this issue Apr 28, 2020 · 11 comments
Open

[EKS] [request]: Simplify CNI custom networking #867

mikestef9 opened this issue Apr 28, 2020 · 11 comments
Assignees
Labels
EKS Networking EKS Networking related issues EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue

Comments

@mikestef9
Copy link
Contributor

mikestef9 commented Apr 28, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
Simplify and remove certain steps required to use custom networking with VPC CNI plugin.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Custom networking is a feature that allows you to run pods using separate subnets and security groups from worker nodes, however multiple setup steps are required:

  • Setting up secondary VPC CIDR blocks can be time consuming, and requires a long string of EC2 API calls. There should be an automated command to set this up.
  • Max pods must be manually calculated and passed to kubelet for worker nodes. This should be automated, and will also allow custom networking to work with Managed Node Groups.
    • Managed node groups now automatically calculates recommended max pods based on VPC CNI settings.
  • ENIConfigs must be created for each availability zone. There should be an option to auto discover these subnets based on tags.

Many of these steps should be simplified and/or automated.

Additionally, documentation is limited. Should add more content like this to EKS docs.

@stevehipwell
Copy link

  • ENIConfigs must be created for each availability zone. There should be an option to auto discover these subnets based on tags.

Is there any news on the above point? In a real world cluster with both public and private subnets an ENIConfig per AZ isn't enough, one is needed per subnet. To do this currently you need to use a dynamic label to use multi AZ ASGs.

@mikestef9
Copy link
Contributor Author

Want to get some feedback on what we are thinking here.

For subnets where you want pods to run, tag the subnet with key vpc-cni:pod-subnet and valued shared. Set AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG to True.

The VPC CNI plugin will periodically make a DescribeSubnets API call, and filter by the VPC ID of the cluster, as well as by subnets having tag key vpc-cni:pod-subnet. The plugin will loop through each subnet returned, and a create/update a map of availability zone to subnets.

When a new node is launched, and AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG is set to True, behavior will initially remain the same, with CNI plugin looking for an ENIConfig. If found it will use that configuration to provision additional ENIs on the node.

If no ENIConfig is found, CNI plugin will query the map from the previous step, and lookup all the subnets based on the availability zone of the worker node.

The subnet field in ENIConfig will be made optional. If you are ok with having security groups copied from the primary ENI to secondary ENIs, then ENIConfig is no longer required at all in this proposal. But if you do care about different security groups as well, you can still specify them in the ENIConfig and use a node label or annotation to point to that ENIConfig like today. The upside with this is there is no AZ dependency, and there can be a single ENIConfig that could potentially be used for all nodes if only security groups need to specified. Further, security groups for pods also works with custom networking, so you can leverage that feature to specify even more fine grained security groups if needed.

Open Questions:
How to pick which subnet to use if multiple are found per AZ of the worker node? Some initial ideas below

  • Random
  • Look at availableIpAddressCount of each subnet and for each node, choose the one with the most free IPs.
  • Have the value of the tag on the subnet be an integer instead of shared, and use that value as priority sorting mechanism.
  • Something else?

Please let us know any feedback on this idea, or feel free to suggest any other ideas you feel would help simplify your workflow using custom networking today.

@stevehipwell
Copy link

@mikestef9 here's a couple of feedback points after implementing this with the current options.

I'm interested in why the current docs and future plans are AZ based instead of subnet based, which would match the reference architecture? Our requirements involve linking a separate secondary subnet to our public and private subnets. Currently we need to dynamically label our nodes (with the node primary subnet) to achieve this, but it would work better if this could be achieved via subnet tags without us having to add any node specific logic. An extension of the above pattern to use both the vpc-cni:pod-subnet=shared tag to enable the logic and vpc-cni:pod-subnet-for-worker-subnet=subnet-xxxxx to link a secondary subnet to the worker's primary subnet.

I'm also interested if it would be possible to have custom networking enabled but only for the nodes with the label set; or not lose the primary ENI if the custom networking refers back to the node primary subnet.

Finally it would be good if the max pods value could be set dynamically as the required inputs for the calculation are present here.

@jwenz723
Copy link

jwenz723 commented Feb 9, 2021

I like the option:

Have the value of the tag on the subnet be an integer instead of shared, and use that value as priority sorting mechanism.

This seems to be the most flexible.

This option could be integrated with either the Random option or the Look at availableIpAddressCount of each subnet and for each node, choose the one with the most free IPs. option by saying that if more than 1 subnet have the same numerical rank (i.e. if subnet-A and subnet-B both have the value vpc-cni:pod-subnet=1) then the secondary strategy will be performed (Random or availableIpAddressCount).

@yoanisgil
Copy link

Are you also considering adding a feature flag that enables toggling on/off CNI custom networking? This is the trickiest thing to do in the project I'm working on, as the EKS module from terraform has no way of exposing such functionality (because there is no exposure to the configuration of the aws-node daemonset from EKS).

@1mamute
Copy link

1mamute commented Aug 18, 2021

This is an absolute must. Configuring VPC CNI is challenging and introduces a lot of overhead to operators.
If you deployed the VPC CNI via EKS add-on and tweak a setting, you need to patch the aws-node and restart the deamonset manually.

@cazlo
Copy link

cazlo commented Dec 4, 2022

A lot of this complexity is encapsulated in https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/vpc-cni-custom-networking/main.tf

Summary of the "extra complexity" AWS end-users must manage when applying custom networking to worker nodes:

This "extra complexity" negatively impacts the reliability of systems utilizing custom networking. For example, the continued functionality of pods and scaling behavior depends on the user-managed ENIConfig resources to be available and correctly configured. If we play the "chaos engineering" role for a minute and take away 1 ENIConfig, it will totally break the network functionality of new nodes spun up.

Additionally, custom networking is not supported on Windows worker nodes. However, it requires "cluster level" configuration changes to VPC CNI (setting env variables in the aws-node daemonset). This seemingly precludes "safe" custom networking use for mixed-OS clusters workload use cases.

As custom networking provides advantages with regard to:

  • pod startup time (through use of prefix delegation)
  • enhanced security controls (isolation of running pods network)
  • ipv4 capacity planning in situations with limited ip space available

The aws-ia examples and documentation available at https://aws.github.io/aws-eks-best-practices/networking/index/ have greatly helped with this process, however there is still much complexity to manage to use custom networking.

To make this process easier for future devs I would love to see the following:

  • "1 click" button for enabling custom networking. User is responsible with inputting configuration data such as what subnets, node groups to apply this to, and AWS manages the rest.
  • EKS bootstrap script automatically detect custom networking and sets up node appropriately without user input
  • Windows support
  • Merge the content of the open source, AWS provided "EKS Best Practices" documentation with the EKS user documentation

@stevehipwell
Copy link

@cazlo not that this answers your main concerns but it might help you out. You shouldn't need the kubectl binary on your machine to use the kubectl Terraform provider, you'd only need it if you have to run arbitrary commands via either a provisioner or a shell resource.

I think IP prefix mode should be the default behaviour for the VPC CNI which would solve a lot of the configuration issues out of the box. Custom networking could also be defaulted to using a tagging strategy like I suggested above. Then if the node ENI IPs are no longer a constraint node bootstrap shouldn't care about the networking specifics; which is useful as by definition bootstrap can't see the K8s components until it's configured and connected.

On your other points (I'm completely ignoring Windows here) you are constrained by the terraform-aws-eks module, which underpins the blueprints module, and the EKS managed addons. If you're not using managed node groups you should be able to get around most of this by using self-managed addons.

@autarchprinceps
Copy link

Setting ENI_CONFIG_LABEL_DEF via EKS addon isn't even supported, only AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG.
I'm not sure under what circumstances an addon throws away manual changes on envs, a simple version update thankfully doesn't seem to, but if you offer the option to set envs via addon, you should at least support all those featured in official docs.aws.amazon.com userguides.

@sjastis sjastis added the EKS Networking EKS Networking related issues label Aug 26, 2023
@jayasuryakumar-dh
Copy link

I am following this document https://repost.aws/knowledge-center/eks-custom-subnet-for-pod to use IPs from ENIConfig subnet than node subnet for the pods.

Below are my specs:
EKS cluster version: 1.24
amazon-k8s-cni-init:v1.15.0-eksbuild.2
amazon-k8s-cni:v1.15.0-eksbuild.2

I follow the same steps in the document.

  1. Set the env variable kubectl set env daemonset aws-node -n kube-system AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true. aws-node pods are restarted, running.
  2. Created ENIConfig objects with same name as AZ(eu-west-1a, eu-west-1b, eu-west-1c) and without security groups.
  3. I wanted to automatically label new nodes with ENIConfig object.

But when I set the following env variable kubectl set env daemonset aws-node -n kube-system ENI_CONFIG_LABEL_DEF=topology.kubernetes.io/zone the aws-node pods are restarted and failing with following error.

Warning  Unhealthy  23s   kubelet            Readiness probe failed: {"level":"info","ts":"2023-10-05T13:40:15.581Z","caller":"/root/sdk/go1.20.4/src/runtime/proc.go:250","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning  Unhealthy  13s   kubelet            Readiness probe failed: {"level":"info","ts":"2023-10-05T13:40:25.587Z","caller":"/root/sdk/go1.20.4/src/runtime/proc.go:250","msg":"timeout: failed to connect service \":50051\" within 5s"}
Warning  Unhealthy  3s    kubelet            Readiness probe failed: {"level":"info","ts":"2023-10-05T13:40:35.605Z","caller":"/root/sdk/go1.20.4/src/runtime/proc.go:250","msg":"timeout: failed to connect service \":50051\" within 5s"}

Can you please share some information? Am I missing something or any extra configuration is needed?

@sjastis
Copy link

sjastis commented Apr 5, 2024

Appreciate your feedback on how we can simplify default experience for IP address management in VPC CNI. Starting with VPC CNI v1.18, we support automatic subnet discovery and dynamic address allocation based on IP address utilization across available subnets. To learn more, here is blog post : https://aws.amazon.com/blogs/containers/amazon-vpc-cni-introduces-enhanced-subnet-discovery/

For use cases that do not require running pods on a different subnet and using separate security groups, we believe the new feature ( also enabled by default) provides a more simpler experience. Check it out and let us know how we can improve the default experience further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Networking EKS Networking related issues EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests

9 participants