Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

General discussion about alternative node provisioner and maybe alternative host OSes #1359

Closed
mumoshu opened this issue Jun 12, 2018 · 13 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mumoshu
Copy link
Contributor

mumoshu commented Jun 12, 2018

I want to talk with you all about supporting alternative node provisioner other than coreos-cloud-init, and maybe alternative host OSes other than Container Linux in kube-aws.

Why

coreos-cloud-init is already deprecated since a year or more ago. Before coreos-cloud-init is completely dropped from Container Linux, we kube-aws users should have a solid way to keep provisioning our k8s nodes in a maintainable manner.

What

Extract parts of useradata/cloud-config-* files to declarative specs of the node roles, so that we can render various node provisioning configs from it - coreos-flavoed cloud-config and ignition/ct for Container Linux, vanilla cloud-config for Ubuntu, Amazon Linux(hopefully one used for AWS EKS), and so on.

Seems like a big change? Not really, especially for the declarative node role specs part. kube-aws has a secret mechanism to support implementing kube-aws plugins. A plugin is able to:

  • install arbitrary files and systemd units via the spec.configuration.node.roles configuration
  • add arbitrary kubelet and apiserver flags via spec.configuration.kubernetes

See #1136 (comment) for a working example.

Alternatives

How about simply dropping the existing coreos-flavored cloud-config templates and migrate to Ignition, probably a ct-based template? Doing so would greatly divide the kube-aws users into the two, and the early adapters for the ignition-based kube-aws would receive almost no business benefits until it matures.

So, I'd want to start by adding an optional support for Ignition-based templates, and gradually advocate people to try and migrate to it. Sounds good?

Let me retrospect where we are today. Even though Container Linux is an awesome piece of software and operating system in the world, we aren't actually leveraging it fully.

For example, we have been long disabled Container Linux's neat feature like auto-updates for extra operational confidence #1241. I believe we don't have a strong opinion to use rkt, although it did allowed us to get rid of the early-docker.service 2 years ago. Why do we need to stick ONLY to Container Linux when you don't fully utilize it? I do understand that the minimalism of Container Linux would be a great fit for the host OS of the container era. But would it justify kube-aws to be tied to Container Linux today?

Call for feedbacks

Will you keep using coreos-cloud-init until it is finally removed? Are you planning to use Ignition, or considering alternative host OSes like me?

Would you be happy with the change?

Are you Any input is welcomed. Thanks!

Relevant topics

  • Transforming features like kiam support to kube-aws plugins would make the transition easier. kube-aws plugins won't be affected by node provisioners by its nature. Extracting plugins out of current cloud-config would make it easy to translate it to other config formats.

Context

FYI, I'd leave old but relevant issues:

Migrating from coreos-cloud-init to Ignition: #728
General scripts/ assets that are deployed to nodes: #580
The kube-aws plugin system: #509 #751 #791

@cknowles
Copy link
Contributor

@mumoshu as per our discussion:

  1. Which part of this could we break off first?
  2. How will different options share parameters? Some sort of global variable set a bit like Helm?
  3. Do you think we update a set of templates in a bucket and render later? Or we always render on upload? The second option seems less fragile but less dynamic also. Best to keep it simple for now and use the second option?
  4. There were some previous discussions like Best method to permanently modify kube-dns configuration? #1089 where users update their clusters outside of kube-aws update and do not expect their changes to be wiped (but they are on controller cycles). However, we also have users who rely on kube-aws update to reset to known good values/provide update mechanism for a given cluster. Do we need to encapsulate any of that concern in this changeset? We were considering whether we needed to support optional and mandatory options in kube-aws, the latter of which is always wiped. However, the added complexity could confuse users and create more cluster problems and support issues than existing update issues. We could support generic ways to customise config? Or we just rely on separate files being easier for users to adjust and keep track of what/why they changed those.

@davidmccormick
Copy link
Contributor

davidmccormick commented Jun 12, 2018

Hi

I successfully migrated a custom kubernetes implementation from using cloud-config to ignition without any major headaches by converting my cloud-config configuration to their yaml based Container Linux config format and then using their transpiler tool to convert that to ignition json. There are a few syntax differences between the legacy cloud-config and container linux yaml formats but a conversion is possible without having to re-write the config (I used sed). It's not perfect but worth consideration.

Regarding supporting more host OSes's:-

  • Should we be opinionated and offer the best solution or be broad which will lead to more resistance in pushing out new benefits/features as we need to support multiple bases?
  • What will it give the users of kube-aws functionality wise? (if it runs docker and supports our features should it matter what the OS is)
  • Will it dilute effort and time supporting multiple different base OSes that perform the same functionality?

Personally, I wouldn't choose to support more than one OS because of the work that will be involved in supporting more than one option - I'd rather spend the time delivering features and the latest kubernetes versions. I think it would be fine to change the OS if we think another one will be better than CoreOS but then we should/would need to manage a migration to it - but not try and support them in parallel.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jun 12, 2018

I had discussions with @c-knowles and @davidmccormick in Slack.

It was very contentful so I can't summarize it very well. TL;DR; direct support for alternative OSes in kube-aws won't justify the cost given there are huge cons regarding testing and user support.

If anyone was looking forward to alternative host OSes, it wouldn't happen in kube-aws itself. But I think it is still technically possible to make kube-aws flexible enough to accept k8s worker nodes provisioned with an another tool. It would support some use-cases.

Regarding migration to ignition, we seem to have two directions to achieve it. To translate coreos-flavored cloud-config to ignition config(json), or add kube-aws an alternative userdata template that renders Container Linux Config.

The former leaves us to stick with cloud-config as the primary configuration syntax even if the resulting config is ignition. The latter doesn't, but has a con that forces us to maintain two userdata templates.

However, I believe that we can alleviate the burden of maintaining two userdata templates by extracting almost all everything hard-coded in the current cloud-config templates(cloud-config- worker, cloud-config-controller, and cloud-config-etcd) into kube-aws plugins. So I'm inclined to go with the latter for now.

I'd appreciate any feedback on that too. Thank you very much for all your supports 👍

@mumoshu
Copy link
Contributor Author

mumoshu commented Jun 12, 2018

If we start to extract things into kube-aws plugins, we'd need a documentation for the plugin system. Ah, I remember that this is probably the third time I say this :)

@mumoshu
Copy link
Contributor Author

mumoshu commented Jun 12, 2018

extracting almost all everything hard-coded in the current cloud-config templates(cloud-config- worker, cloud-config-controller, and cloud-config-etcd) into kube-aws plugins

Btw this allows anyone to write an additional userdata template that renders vanilla cloud-config, which is then consumed by ubuntu or amazon linux or other host OSes. Please feel free to use kube-aws a library. I'm open to any ideas to make kube-aws flexible.

@mumoshu
Copy link
Contributor Author

mumoshu commented Jun 14, 2018

FYI: There is a ongoing work to build a successor to Container Linux.

@whereisaaron
Copy link
Contributor

Personally I don't see any reason for kube-aws to support more than one OS. The node OS should by design have little or not impact for the k8s user/admin. 'Sufficient yet minimal' is it prime criteria I think, which makes CoreOS, and hopefully Red Hat CoreOS, a good fit.

@jorge07
Copy link
Contributor

jorge07 commented Jun 18, 2018

I'm not very happy with CoreOS. I've being fighting A LOT with bugs in the cloud-init, systemd, docker related, etc and sometimes the fix was too complex without reason. I've other clusters in GCloud and it never happened.
+1 to support other OS.

@whereisaaron
Copy link
Contributor

@jorge07 the Red Hat acquisition is probably good news then. Google and Red Hat have the resources and process to do a lot more pre-release testing than I expect CoreOS could. I imagine Red Hat will do a better job of heading off upstream bugs, as they do with RHEL. I'm not actually saying it must be CoreOS, rather just pick the best one for kube-aws and just support that one.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 24, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants