Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] Support node specific configuration #2367

Open
fabriziopandini opened this issue Dec 18, 2020 · 11 comments
Open

[Roadmap] Support node specific configuration #2367

fabriziopandini opened this issue Dec 18, 2020 · 11 comments
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Milestone

Comments

@fabriziopandini
Copy link
Member

Kubeadm as of today does not provide a first-class support node-specific configuration.

There are ways to achieve this, like using patches, or for kubelet only, by providing node-specific extrargs, but in the end, the UX is not ideal and we are not covering all the use cases.

As a consequence, I'm proposing to include in the discussions for the next kubeadm roadmap also a definition of proper support for node-specific configurations in the kubeadm API, possibly in line with the outcomes of the ongoing discussion in kubernetes/enhancements#1439.

This should cover both the core components managed by kubeadm (API server, controller-manager, scheduler, etcd, kubelet) but also addon when this makes sense.

@neolit123 neolit123 added this to the v1.21 milestone Jan 8, 2021
@neolit123 neolit123 added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 8, 2021
@neolit123 neolit123 modified the milestones: v1.21, Next Feb 3, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 4, 2021
@fabriziopandini
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 5, 2021
@k8s-triage-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 3, 2021
@fabriziopandini
Copy link
Member Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 5, 2021
@neolit123 neolit123 added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Nov 8, 2023
@neolit123
Copy link
Member

we are already at v1beta4 and i just don't think v1 will have instance specific configuration that introduces new structures like NodeConfiguration that is stored in a CM for a given node and managed by kubeadm.

we actually also haven't seen demand for that from users. seems like they are fine with the patches and also keeping the init/joinconfiguration YAML around. the only difficult aspect has been the kubelet instance configuration and we are finally adding that with

maybe we can consider this for a vNext (e.g. v2), /shrug

@mohag
Copy link

mohag commented Nov 29, 2024

My problem with the current patch support is that the parameter needs to be given with upgrades... Keeping the config for the path to the patches in the kubeadm-config configmap or an annotation on the node would help... (An annotation might make most sense...)

We often have the problem that people forget to apply the extra parameter when upgrading...

Edit: I'm referring to the kubelet config patches above, they are the most commonly used ones.

I also see that the setting can now be given in the InitConfiguration and JoinConfiguration, which I can't remember being the case previously... (I know that there were warnings passing --config to kubeadm upgrade last time I did that...) (I see the --patches parameter seems to be needed for upgrades instead of the config, looking at the note in the docs)

The non-kubelet patches might not have such an obvious place to save the path...

I see the note about the kubeletExtraArgs setting (we were mostly messing with /etc/defaults/kubelet instead, but often had to edit the kubeadm-generated file in the case of upgrades (migrating from docker to containerd, removing deprecated / removed flags), which is also not ideal...) (Using Ansible for much of it)

@neolit123
Copy link
Member

neolit123 commented Nov 29, 2024

My problem with the current patch support is that the parameter needs to be given with upgrades... Keeping the config for the path to the patches in the kubeadm-config configmap or an annotation on the node would help... (An annotation might make most sense...)

one benefit of the patches approach is that it gives the user the control to not apply a certain patch (node config) to components when they are not needed on upgrade to version next. if during init/join a certain node config or the patches for it are written in the cluster that means the node config becomes pinned, and to fix that the user must go and edit a config map before upgrade if that is no longer needed. to me it seems the UX is not necessary better for this.

We often have the problem that people forget to apply the extra parameter when upgrading...

that is a user mistake, i would say.
v1beta4 includes an upgradeconfiguration and the "patches" field, so that will make the story a bit better - e.g. all kubeadm commands should accept a --config and you can have the patches field in there.

I see the note about the kubeletExtraArgs setting (we were mostly messing with /etc/defaults/kubelet instead, but often had to edit the kubeadm-generated file in the case of upgrades (migrating from docker to containerd, removing deprecated / removed flags), which is also not ideal...) (Using Ansible for much of it)

ideally all user provider kubelet flags should be gone. (as they are deprecated in the kubelet). but we will keep kubeletExtraArgs , of course,
we added #3042 for that purpose. first flag to migrate is the CRI socket.
but this effort will take a while.

@mohag
Copy link

mohag commented Nov 29, 2024

My problem with the current patch support is that the parameter needs to be given with upgrades... Keeping the config for the path to the patches in the kubeadm-config configmap or an annotation on the node would help... (An annotation might make most sense...)

one benefit of the patches approach is that it gives the user the control to not apply a certain patch (node config) to components when they are not needed on upgrade to version next. if during init/join a certain node config or the patches for it are written in the cluster that means the node config becomes pinned, and to fix that the user must go and edit a config map before upgrade if that is no longer needed. to me it seems the UX is not necessary better for this.

The settings that we have in the patches seems unlikely to change significantly between versions: cgroupDriver (I think all have been migrated to systemd by now at least...), maxPods and resolvConf (we have had clusters with a mix of node operating systems, mostly for testing and then that setting needs to be set per-node) (criSocket would also make sense, but there is the kubeadm.alpha.kubernetes.io/cri-socket annotation for that as well and the interaction between the methods is unclear) (we currently still use kubelet parameters for the socket and then set the annotation to match)

(The non-kubelet configs is another story (I think you replied on the pre-edit version of my comment, before I remembered that patches are supported for more than just kubelet (we only use it for kubelet))) (kubelets have a 1:1 mapping to Node objects, the control plane components not necessarily (I do not think kubeadm supports control-plane nodes without kubelet though))

I do not think that keeping the contents of the patch on the cluster makes sense, but the path to it might (the UpgradeConfiguration would help as well though)

We often have the problem that people forget to apply the extra parameter when upgrading...

that is a user mistake, i would say. v1beta4 includes an upgradeconfiguration and the "patches" field, so that will make the story a bit better - e.g. all kubeadm commands should accept a --config and you can have the patches field in there.

I see the note about the kubeletExtraArgs setting (we were mostly messing with /etc/defaults/kubelet instead, but often had to edit the kubeadm-generated file in the case of upgrades (migrating from docker to containerd, removing deprecated / removed flags), which is also not ideal...) (Using Ansible for much of it)

ideally all user provider kubelet flags should be gone. (as they are deprecated in the kubelet). but we will keep kubeletExtraArgs , of course, we added #3042 for that purpose. first flag to migrate is the CRI socket. but this effort will take a while.

Currently, the kubelet flags that we go to a lot of effort to add, is --allowed-unsafe-sysctls on EKS nodes (not relevant for kubeadm) through a cloud-init script. (managed providers often give very little control over the kubelet configuration (which is often generated by a script running after cloud-init), so some way to keep some option would be useful - it might even be a flag to give a path to a patch that kubelet will apply to its own config for cases like this)

In the kubeadm cases, the playbook mainly removes the settings added previously (some of these clusters have been upgraded from ~1.15, so the parameters changed quite a bit over time) (I would prefer if that config was updated by kubeadm on upgrades) (The runtime settings is updated elsewhere if the runtime changes)

- name: Config cleanups for newer version
  block:
  - name: remove resolv-conf parameter (it is set from the config instead)
    ansible.builtin.replace:
      path: /var/lib/kubelet/kubeadm-flags.env
      regexp: ' ?--resolv-conf=/[^" ]*'
      replace: ''
    notify: restart kubelet
  - name: remove pod-infra-container-image parameter (If set to the default value)
    ansible.builtin.replace:
      path: /var/lib/kubelet/kubeadm-flags.env
      regexp: ' ?--pod-infra-container-image=(k8s.gcr.io|registry.k8s.io)/pause:[^ "]*'
      replace: ''
    notify: restart kubelet
  - name: remove cgroup-driver parameter (it is set from the config instead)
    ansible.builtin.replace:
      path: /var/lib/kubelet/kubeadm-flags.env
      regexp: ' ?--cgroup-driver=\w+'
      replace: ''
    notify: restart kubelet

@neolit123
Copy link
Member

neolit123 commented Nov 29, 2024

(criSocket would also make sense, but there is the kubeadm.alpha.kubernetes.io/cri-socket annotation for that as well and the interaction between the methods is unclear) (we currently still use kubelet parameters for the socket and then set the annotation to match)

the link in my previous post is the feature which adds a feature gate to not use the annotation and instead use a var/lib/kubelet/instance-config.yaml (which is technically a patch).

I do not think that keeping the contents of the patch on the cluster makes sense, but the path to it might (the UpgradeConfiguration would help as well though)

i think it does help as it adds the consistency with --config.

Currently, the kubelet flags that we go to a lot of effort to add, is --allowed-unsafe-sysctls on EKS nodes (not relevant for kubeadm) through a cloud-init script. (managed providers often give very little control over the kubelet configuration (which is often generated by a script running after cloud-init), so some way to keep some option would be useful - it might even be a flag to give a path to a patch that kubelet will apply to its own config for cases like this)

the kubelet exposes it as a field as well:
https://github.com/kubernetes/kubernetes/blob/810e9e212ec5372d16b655f57b9231d8654a2179/staging/src/k8s.io/kubelet/config/v1beta1/types.go#L699
so you can use a kubeadm patch with a kubeletconfiguration target instead of passing kubeletExtraArgs for the nodes that need it.

In the kubeadm cases, the playbook mainly removes the settings added previously (some of these clusters have been upgraded from ~1.15, so the parameters changed quite a bit over time) (I would prefer if that config was updated by kubeadm on upgrades) (The runtime settings is updated elsewhere if the runtime changes)

we do perform cleanup but only for the flags kubeadm maintains (i.e. not for --allowed-unsafe-sysctls0. if we missed doing cleanup on a managed flag that is no longer needed that's a kubeadm bug.

@mohag
Copy link

mohag commented Dec 3, 2024

the link in my previous post is the feature which adds a feature gate to not use the annotation and instead use a var/lib/kubelet/instance-config.yaml (which is technically a patch).

That config, if enabled by default or by a command-line parameter, would work for our EKS issue as well... (where kubeadm is not involved and they generate the config from their node bootstrap script (called from the end of a cloud-init script), so changing settings in the config is hard there)

i think it does help as it adds the consistency with --config.

Correct 🎉

Currently, the kubelet flags that we go to a lot of effort to add, is --allowed-unsafe-sysctls on EKS nodes (not relevant for kubeadm) through a cloud-init script. (managed providers often give very little control over the kubelet configuration (which is often generated by a script running after cloud-init), so some way to keep some option would be useful - it might even be a flag to give a path to a patch that kubelet will apply to its own config for cases like this)

the kubelet exposes it as a field as well: https://github.com/kubernetes/kubernetes/blob/810e9e212ec5372d16b655f57b9231d8654a2179/staging/src/k8s.io/kubelet/config/v1beta1/types.go#L699 so you can use a kubeadm patch with a kubeletconfiguration target instead of passing kubeletExtraArgs for the nodes that need it.

I do know it is in the config, but we can't control the config that AWS generates with EKS.... (for the kubeadm cluster that would be useful) (we currently only set that for a specific customer's application where we need to tweak network buffers and max connections) (whether the items are actually safe and missing from the default allowlist might be the real question)

It might be possible to add an instance-config, but AWS might use that for their own purposes.... (an instance-config.d directory might be the other way...)

we do perform cleanup but only for the flags kubeadm maintains (i.e. not for --allowed-unsafe-sysctls0. if we missed doing cleanup on a managed flag that is no longer needed that's a kubeadm bug.

With kubeadm, those tend to go into /etc/default/kubelet. (It is a lot easier if just one tool mess with a config) (We don't currently have extra settings on any of the kubeadm-deployed clusters)

I suspect that in our case, we mainly messed with it when switching existing servers from Docker to containerd. (I'm not sure if setting the annotation / patches and running kubeadm upgrade node might have done that as well (or at least moved the setting to the config))

@neolit123
Copy link
Member

neolit123 commented Dec 3, 2024

the link in my previous post is the feature which adds a feature gate to not use the annotation and instead use a var/lib/kubelet/instance-config.yaml (which is technically a patch).

That config, if enabled by default or by a command-line parameter, would work for our EKS issue as well... (where kubeadm is not involved and they generate the config from their node bootstrap script (called from the end of a cloud-init script), so changing settings in the config is hard there)

note, the kubelet has another way of doing instance config without kubeadm.
https://github.com/kubernetes/enhancements/blob/3911f017579157f35a297c58b66063e1216ba487/keps/sig-node/3983-drop-in-configuration/README.md?plain=1

we do perform cleanup but only for the flags kubeadm maintains (i.e. not for --allowed-unsafe-sysctls0. if we missed doing cleanup on a managed flag that is no longer needed that's a kubeadm bug.

With kubeadm, those tend to go into /etc/default/kubelet. (It is a lot easier if just one tool mess with a config) (We don't currently have extra settings on any of the kubeadm-deployed clusters)

kubeadm is only going to cleanup kubeadm-flags.env as the /etc/default/kubelet is user owned.

I suspect that in our case, we mainly messed with it when switching existing servers from Docker to containerd. (I'm not sure if setting the annotation / patches and running kubeadm upgrade node might have done that as well (or at least moved the setting to the config))

the migration guide had instructions to modify the annotation and kubeadm-flags.env, FWIW
https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/migrate-dockershim-dockerd/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

6 participants