Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EKS] [request]: API flag to initialize completely bare EKS cluster #923

Closed
sc250024 opened this issue May 29, 2020 · 40 comments
Closed

[EKS] [request]: API flag to initialize completely bare EKS cluster #923

sc250024 opened this issue May 29, 2020 · 40 comments
Labels
EKS Add-Ons EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue

Comments

@sc250024
Copy link

sc250024 commented May 29, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request

Essentially, I'm looking for an extra option in the AWS API where EKS is initialized with a completely bare cluster (i.e. no coredns, aws-node, or kube-proxy deployments / daemonsets). Only the EKS control plane is provided.

Which service(s) is this request for?

EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Kubernetes lifecycle management is a problem which many tools are solving / attempting to solve. With Kubernetes objects, there's no easy way to "inherit" an object that already exists, and apply changes over it. If an object exists, and you want to change it without completely deleting / reinstalling it, you either have to (AFAIK):

  • Run a kubectl edit or kubectl patch with the in-place objects to change what you want.
  • Have the original manifest which was applied previously, and run a kubectl apply with the new options.

In fact, the Kubernetes documentation here talks about the various methods: https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#in-place-updates-of-resources

With Helm charts this problem is pronounced. If I want to apply a Helm chart, and someone has already applied a Kubernetes YAML manifest manually with similar names, I will get errors with Helm because those objects already exist.

For my company, we want to provision / de-provision EKS clusters with as much automation as possible, but what we find is that there are certain manual steps which must be performed with EKS. To name a few:

  • Kube-Proxy ConfigMap metrics
    • In order to get Prometheus to successfully scrape the kube-proxy process, we have to update the listen address in the ConfigMap like so:
# Edit kube-proxy ConfigMap to allow metrics scraping.
# Replace `metricsBindAddress: 127.0.0.1:10249` with `metricsBindAddress: 0.0.0.0:10249`
$ kubectl edit --namespace kube-system configmap/kube-proxy-config

# Afterwards, restart all `kube-proxy` pods
$ kubectl delete pods --namespace kube-system --selector='k8s-app=kube-proxy'
  • CoreDNS

    • The CoreDNS component that comes with EKS does not have a proportional autoscaler, whereas the CoreDNS Helm chart does. After provisioning the cluster, we then (1) delete the default CoreDNS deployment and all associated resources, and then (2) apply the Helm chart.
  • AWS VPC CNI

    • Same as CoreDNS. After provisioning the cluster, we then (1) delete the default AWS VPC CNI, and then (2) apply the AWS VPC CNI Helm chart.
  • Kube-proxy

    • Same as above.
  • AWS Auth ConfigMap

    • The ConfigMap object already exists, so we have to take special care to update it ourselves.

All of these (and similar) problems would be solved by simply having a flag to initialize a cluster which is completely empty, and let whatever tools we use internally to build up the cluster as we see fit. This is more of a functionality for power / advanced users, but the use case definitely exists.

Are you currently working around this issue?

We are, but we are either performing these actions manually, or as part of a pipeline. For the case of CoreDNS / AWS VPC CNI / Kube-Proxy, we essentially must store a Kubernetes YAML in our Git repositories which we can point to when running kubectl delete.

@sc250024 sc250024 added the Proposed Community submitted issue label May 29, 2020
@mikestef9 mikestef9 added the EKS Amazon Elastic Kubernetes Service label May 29, 2020
@TBBle
Copy link

TBBle commented Jun 15, 2020

To improve the workaround, it should be possible to use kubectl annotate and then adopt the existing resource objects into a Helm release, as of Helm 3.2.0. See "Release Note" on the pull request for details.

For some reason, that didn't actually make it into the release notes, or the docs. Future work will automate this further in Helm, so they might be waiting to document it with that.

@stevehipwell
Copy link

@sc250024 do you have an example of your workflow for capturing CoreDNS and kube-proxy as Helm charts? We already do this for the aws-vpc-cni (we use the remote yaml referenced in the upgrade guide to delete this).

@sc250024
Copy link
Author

sc250024 commented Jan 7, 2021

@sc250024 do you have an example of your workflow for capturing CoreDNS and kube-proxy as Helm charts? We already do this for the aws-vpc-cni (we use the remote yaml referenced in the upgrade guide to delete this).

Actually we don't do that currently; we're using the coredns and kube-proxy installations that come with the cluster by default. For CoreDNS specifically, we'd like to use the Helm chart since it includes an autoscaler that adds Pods as the cluster size itself scales.

In general, we automate a lot of our provisioning, and right now, we have to do a lot of hacks to either apply something over an existing resource, or patch an existing resource. It's really just running kubectl commands through Terraform.

@stevehipwell
Copy link

@sc250024 it sounds like we've got very similar requirements. Currently we have automated kube-proxy and CoreDNS version patching via Terrafrom and when we bootstrap a cluster we remove the aws-vpc-cni installed and replace it with the helm chart. My highest priority would be to delete the default CoreDNS and capture that with a helm chart.

@TBBle
Copy link

TBBle commented Jan 7, 2021

I was curious, and had a play with the CoreDNS Helm chart to see how close I could get to generating the existing AWS deployment of CoreDNS. It's not far off, but it highlights a few differences:

  • AWS might be running a patched CoreDNS deployment that needs to look at the node list, based on ClusterRole differences.
  • AWS has done some hardening in their deployment (read-only root with /tmp on emptyDir volume, and all privileges dropped except NET_BIND_SERVICE) which CoreDNS doesn't have in their chart, and can't fully express in the values.yaml
  • CoreDNS Helm chart Prometheus metrics support might not be functional, looks like it's missing a containerPort.
  • CoreDNS Helm chart distinguishes Live and Ready, AWS's deployment does not. (Although maybe AWS's version of CoreDNS suffers from Ready plugin continues to answer OK during lameduck period on existing connections coredns/coredns#4099 in the ready plugin)
  • CoreDNS Helm chart specifies "criticalness" using annotations that (at least in one case) haven't been honoured since k8s 1.16.

(There's more details and less-impactful differences in the comments on the YAML)

So you could use the values.yaml attached (updating REGION and DNS_CLUSTER_IP as is done for the AWS applied YAML), and then annotate/label the conflicting objects for adoption, delete the kube-dns Service (because you need to steal its ClusterIP), and helm install should adopt the existing options and take over as the cluster DNS service.

Of course, deleting the kube-dns Service isn't great as you have a period of DNS outage, but I'm unaware of a good way to transition cleanly without that. That should be the only object you need to delete by hand before installing CoreDNS though, so the outage period can be measured in 10's of seconds, assuming everything else works. (Unless there are other things with immutable fields... The Deployment might be one, actually)

Helpfully, every object in the AWS yaml is labelled with eks.amazonaws.com/component: kube-dns, so it's easy to hunt-down leftover objects after the adoption: things with that label but lacking the app.kubernetes.io/managed-by: "Helm" label are orphaned leftovers. Things with both labels were adopted by Helm and are part of the chart now.

A couple of the above issues (and other things called out in the text) are possibly bug-reports or feature-requests to be raised with CoreDNS.

Note that these are not recommended settings. They are mirroring the existing AWS YAML as closely as possible, including possible feature regressions, e.g., rollback to CoreDNS 1.7.0, disabling lameduck and ttl in the service setup.

On the other hand, some are important, like limiting the Deployment to 64-bit Linux hosts, and EC2 (i.e. not Fargate). Unless you want CoreDNS on Fargate of course. Then it's a regression. ^_^

A values.yaml describing the differences
# Contrasting AWS CoreDNS 1.7.0 install from https://docs.aws.amazon.com/eks/latest/userguide/coredns.html
#  curl -o dns.yaml https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/dns.yaml
# VS the current CoreDNS 1.8.0 Helm chart
#  helm repo add coredns https://coredns.github.io/helm
#  helm repo update
#  helm template coredns coredns/coredns --namespace kube-system --values aws.coredns.values.yaml
# (This file is aws.coredns.values.yaml)

## Differences I could not capture:

# AWS's Service is named kube-dns, CoreDNS creates one named coredns

# The ClusterRole and ClusterRoleBinding in AWS's YAMLare Default and named system:coredns,
# with Auto-reconciliation disabled, see
# https://kubernetes.io/docs/reference/access-authn-authz/rbac/#default-roles-and-role-bindings
# and had the following extra rule, I'm not sure why.
#  - apiGroups:
#    - ""
#    resources:
#    - nodes
#    verbs:
#    - get
#
# This might be something that AWS have patched into their CoreDNS binary's kubernetes plugin,
# i.e. similar to the one propsed at https://github.com/coredns/coredns/issues/3077
# which was eventually punted as a different plugin and abandoned.
#
# CoreDNS Helm chart names its ClusterRole/Binding simply 'coredns' (i.e. fullNameOverride) and they are labelled as
#  kubernetes.io/cluster-service: true
# instead.

# The Prometheus metrics have a separate Service in the Helm chart, but are scraped
# from the main Service in the CoreDNS chart
# That said, the CoreDNS chart doesn't seem to have a containerPort exposed for them. Bug in the Helm chart?

# AWS's Pod has the following that CoreDNS Helm chart doesn't support
#        securityContext:
#          allowPrivilegeEscalation: false
#          capabilities:
#            add:
#            - NET_BIND_SERVICE
#            drop:
#            - all
#          readOnlyRootFilesystem: true

# CoreDNS Helm chart has the following annotations (old name for priorityClassName and tolerations respectively)
# when isClusterService is set.
# Goodness, these are old, and someone should fix the CoreDNS chart, as they are no longer effective in current k8s.
#        scheduler.alpha.kubernetes.io/critical-pod: ''
#        scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'

# AWS Pod mounts the config-volume read-only.

# Helm chart distinguishes readiness probe from health probe. (More-modern approach)

# Helm chart specifies a maxSurge (25%) for the Deployment's rollingUpdate.

# Various minor diferences:
# - Labels and annotations
# - The container port names are different
# - Generated Helm chart doesn't have namespace metadata, because Helm takes care of that.

fullnameOverride: coredns

serviceAccount:
  create: true

priorityClassName: system-cluster-critical

replicaCount: 2

image:
  repository: 602401143452.dkr.ecr.REGION.amazonaws.com/eks/coredns
  tag: v1.7.0-eksbuild.1

podAnnotations:
  eks.amazonaws.com/compute-type: ec2

service:
  clusterIP: DNS_CLUSTER_IP

extraVolumes:
- name: tmp
  emptyDir: {}

extraVolumeMounts:
- name: tmp
  mountPath: /tmp

terminationGracePeriodSeconds: 0

resources:
  limits:
    cpu: null
    memory: 170Mi
  requests:
    memory: 70Mi

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: "beta.kubernetes.io/os"
          operator: In
          values:
          - linux
        - key: "beta.kubernetes.io/arch"
          operator: In
          values:
          - amd64
          - arm64
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: k8s-app
            operator: In
            values:
            - coredns
        topologyKey: kubernetes.io/hostname
      weight: 100

tolerations:
- key: node-role.kubernetes.io/master
  effect: NoSchedule
- key: "CriticalAddonsOnly"
  operator: "Exists"

prometheus:
  service:
    enabled: true

# Because of the way Helm works, you cannot override parts of this
# array, so the whole thing is copied out of the coredns/coredns
# defaults (without comments), and the differences with AWS noted.
servers:
- zones:
  - zone: .
  port: 53
  plugins:
  - name: errors
  - name: health
    # AWS doesn't have this
    #configBlock: |-
    #  lameduck 5s
  # AWS doesn't use this plugin at all, but it's needed elsewhere in the chart
  - name: ready
  - name: kubernetes
    parameters: cluster.local in-addr.arpa ip6.arpa
    configBlock: |-
      pods insecure
      fallthrough in-addr.arpa ip6.arpa
    # AWS doesn't have this
      # ttl 30
  - name: prometheus
    # parameters: 0.0.0.0:9153
    # AWS uses the below, I guess that means we're IPv6-ready? *cough*
    parameters: :9153
  - name: forward
    parameters: . /etc/resolv.conf
  - name: cache
    parameters: 30
  - name: loop
  - name: reload
  - name: loadbalance

@stevehipwell
Copy link

@TBBle that's a great summary of the differences. I think the next step would be to open up a PR on the CoreDNS chart to close the gap and allow al of the AWS settings to be set correctly.

As this issue is about providing a bare EKS cluster the potential downtime is probably not an issue, until a bare cluster is an option we remove the unwanted addons before the cluster has any node to run them on.

@TBBle
Copy link

TBBle commented Jan 7, 2021

I should point out that I haven't tested this. It was done using helm template and comparing the YAML. There's definitely opportunities to improve the CoreDNS Helm chart, but I don't think there was anything (except maybe the Prometheus metrics issue) that would make those improvements a blocker for doing the switch today, if I happened to be setting up an EKS cluster.

That said, I probably would not try and replicate some of the AWS differences, like fullnameOverride or the servers block changes, as they were just illustrative.

One thing to keep in mind is that perhaps it's important that the Service be named kube-dns? I didn't enforce that in my illustration. The k8s docs suggest that name might be relied upon by pieces of the system...

So it might be worth proposing that the CoreDNS Helm chart specifically be able to override the Service name separately from the existing fullname used to name the objects.

Or just install the chart as helm install kube-dns coredns/coredns and use fullnameOverride: kube-dns in the values.yaml. I suspect that won't adopt anything, after you delete the existing Service. But it's a little ugly. -_-

@sc250024
Copy link
Author

sc250024 commented Jan 7, 2021

@TBBle that's a great summary of the differences. I think the next step would be to open up a PR on the CoreDNS chart to close the gap and allow al of the AWS settings to be set correctly.

As this issue is about providing a bare EKS cluster the potential downtime is probably not an issue, until a bare cluster is an option we remove the unwanted addons before the cluster has any node to run them on.

@stevehipwell said what I was going to say, which is that the main point was to raise the question for AWS about whether or not they can support this feature. But to @TBBle , appreciate the help with the Helm chart values 😊

To me, it's either one of two things:

  • AWS manages kube-proxy, coredns, aws-node, and any other "core" cluster components completely. This means that autoscaling (where appropriate) happens automatically, and components are upgraded automatically when there's a cluster upgrade.

  • AWS allows people to use an "empty cluster" flag, and allows us to manage everything ourselves with no interruption from them.

Right now, it's in an awkward in-between state in my opinion. They're trying to provide the base cluster components (which makes sense), but stumble a bit with the upgrade path when the control plane is upgraded.

@TBBle
Copy link

TBBle commented Jan 7, 2021

The AWS-managed add-ons approach shipped last month, albeit not many add-ons yet, just aws-node. #252 (comment)

That same ticket did confirm that "bare cluster" is also on the roadmap. I suspect it'll come implicitly once the reamining existing YAML add-ons are all migrated to EKS Add-ons, i.e., #1159.

@tabern
Copy link
Contributor

tabern commented Jan 14, 2021

Hi all,

This feature is in our development plans and I've added it to our public roadmap. We envision that in time, all EKS clusters will use managed add-ons and we will not boot components into clusters that are not managed by EKS and you cannot control via the EKS APIs. Our 3 core add-ons (VPC CNI, coredns, kube-proxy) will still be enabled by default, but you can optionally elect to have them not be installed when you create the cluster.

@sc250024
Copy link
Author

Hi all,

This feature is in our development plans and I've added it to our public roadmap. We envision that in time, all EKS clusters will use managed add-ons and we will not boot components into clusters that are not managed by EKS and you cannot control via the EKS APIs. Our 3 core add-ons (VPC CNI, coredns, kube-proxy) will still be enabled by default, but you can optionally elect to have them not be installed when you create the cluster.

Much appreciated @tabern. Thank you!

@stevehipwell
Copy link

@tabern where are we with this after today's announcement?

@shixuyue
Copy link

shixuyue commented Jun 1, 2021

I have a hacky workaround:
Change eks:addon-manager role in kube-system namespace to remove its permissions of update and patch for configmap.

@stevehipwell
Copy link

@shixuyue what exactly are you doing to manage kube-proxy and coredns?

@shixuyue
Copy link

shixuyue commented Jun 7, 2021

@stevehipwell I dont have special needs for kube-proxy, but I need to add consul forwarder to coredns, so it can resolve consul endpoints from another "cluster"(its not k8s).

@stevehipwell
Copy link

I see that the docs now contain a method for removing add-ins, but I don't think it is possible to do this without removing the default config. This could be useful if there was valid Helm charts for coredns and kube-proxy in aws/eks-charts (or instructions for using the official coredns Helm chart).

@shixuyue
Copy link

shixuyue commented Jun 7, 2021

oh, yea, my hacky workaround works for me. And each time we want to update the plugins, we need to enable the permissions that we just disabled. And once the update is done, we will have to disable it again. So add-on manager doesnt have the permission to revert corefile config map to default. Which is not ideal, but its easy and simple, good as a temp workaround.

@Hokwang
Copy link

Hokwang commented May 4, 2022

@tabern any updates?

@cdobbyn
Copy link

cdobbyn commented May 26, 2022

@tabern if the aws-vpc-cni could match the output from the helm deployed aws-vpc-cni a lot of people would no longer be seeking this. It requires annotations and labels to be updated (so that Helm will accept ownership). While I do believe allowing customers to choose their own adventure for addon's is a good long term goal, this would be a nice quick win for a lot of people here.

Today using terraform we must either split the automation into two steps with a manual intervention in between or get into some pretty ugly custom workflow. My group is specifically just trying to configure custom networking as a component of all new cluster builds.

@mathewmoon
Copy link

@cdobbyn While I agree that updating labels and annotations creates a quick work around, there are already ways to hack around this problem. IMO the topic of this thread should stay focused on the issue that the API should support a bare cluster. Making changes to support making work arounds more convenient I think just obscures the real objective, which is making EKS non opinionated about what services are installed and how.

@cdobbyn
Copy link

cdobbyn commented Jun 4, 2022

@mathewmoon I agree with the goal. EKS clusters should as an advanced option allow us to deploy them bare. I suspect they deploy them with some basics for newcomers.

My comment was simply to offer a comment on a quick-win in case detaching these components is more complicated than we know. Re-reading it I recognise it appears as though I wish to alter the course of this issue (I do not).

@msolimans
Copy link

Hi all,

This feature is in our development plans and I've added it to our public roadmap. We envision that in time, all EKS clusters will use managed add-ons and we will not boot components into clusters that are not managed by EKS and you cannot control via the EKS APIs. Our 3 core add-ons (VPC CNI, coredns, kube-proxy) will still be enabled by default, but you can optionally elect to have them not be installed when you create the cluster.

omg it's almost a year, when are we expecting this to be released?

@aburan28
Copy link

Is there any update on this?

@jordansimmons25
Copy link

@tabern Has there been any progress on this?

@ghost
Copy link

ghost commented Feb 6, 2024

Althgou I try to avoid "me too" commands, yes, I was also impacted by this today while recreating a cluster from scratch. We manage CoreDNS ourselves and install Cilium, we have not enabled any add-ons but there they are: aws-node and friends.

It's funny because the AWS Console shows the option to install those add-ons, as if they were not already installed. So it seems the legacy and new add-on way are clashing.

@tabern
Copy link
Contributor

tabern commented Feb 9, 2024

Hi everyone, it's been a few years. That was a really long nap!
I wanted to let you know we're working on shipping this.

@sc250024
Copy link
Author

sc250024 commented Feb 9, 2024

Hi everyone, it's been a few years. That was a really long nap! I wanted to let you know we're working on shipping this.

@tabern Thank you for continuing work on this. Cannot wait to see the result!

@TarekAS
Copy link

TarekAS commented Feb 9, 2024

Extended Support money has been oiling some gears in the EKS team 👀 We'll be properly bootstrapping clusters in no time boys!

@tabern
Copy link
Contributor

tabern commented Feb 9, 2024

Extended Support money has been oiling some gears in the EKS team 👀 We'll be properly bootstrapping clusters in no time boys!

Buckle up @TarekAS !

@sriramranganathan
Copy link

You can now create Amazon EKS clusters without the default networking add-ons including Amazon VPC CNI, CoreDNS, and kube-proxy. Please check out - https://aws.amazon.com/about-aws/whats-new/2024/06/amazon-eks-cluster-creation-flexibility-networking-add-ons/

To create clusters without the default networking add-ons, use the bootstrapSelfManagedAddons attribute in the CreateCluster API - https://docs.aws.amazon.com/eks/latest/APIReference/API_CreateCluster.html#AmazonEKS-CreateCluster-request-bootstrapSelfManagedAddons

@bryantbiggs
Copy link
Member

xref hashicorp/terraform-provider-aws#38156

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EKS Add-Ons EKS Amazon Elastic Kubernetes Service Proposed Community submitted issue
Projects
Status: Shipped
Development

No branches or pull requests