Make kubeadm start controlling kubelet Config via a versioned file #822

luxas · 2018-05-16T14:39:03Z

Problem statement

Currently kubeadm (the CLI) assumes a running (or actually, crashlooping) kubelet, with the right configuration for the right kubeadm version. This is currently achieved by packaging a kubelet dropin file together with the kubeadm debian package. So we have two different kubeadm entities with different responsibilities: the kubeadm CLI binary and the kubeadm deb package.

This solutions includes a number of problems:

There is no way for kubeadm the CLI to give kubelet instructions how to run
- Instead, we have to tell the user to go and modify the dropin manually which is time consuming and error prone
There is no way for the cluster administrator to enforce a base kubelet configuration policy for all nodes, i.e. set the DNS domain to cluster.global(silly example but anyway)
Upgrading is particularly difficult, as the kubeadm deb package bundles both the kubeadm CLI and the kubelet dropin. Why is this a problem when upgrading? Consider the following scenario:
- You have kubeadm v1.11.0 locally, together with k8s at v1.11.0 and kubelets of v1.11.0
- You want to upgrade to v1.12.
- Between v1.11.x and v1.12.x, the kubelet dropin changed, as it had to. Let's say we set the --cadvisor-port=0flag to be secure in v1.11, but Kubernetes evolved to remove the flag in v1.12 and always turn it off. In other words, if we tried to set --cadvisor-port=0 v1.12 the kubelet would crash.
  - This means, if we use the v1.12.x dropin for a v1.11.x kubelet we're insecure, and if we're using a v1.11.x dropin for a v1.12.x kubelet it'll crash
- First thing, the user needs kubeadm v1.12.0 to upgrade k8s to v1.12.0, but the user can't do apt-get upgrade as that would download a newer kubelet than the API server, which is not supported (TODO reference).
- But we can't do apt-get install kubeadm to only upgrade the kubeadm deb package either, as that would include a newer manifest than supported.
- Current solution:
  - Download kubeadm the CLI only as a standalone binary using curl.
  - kubeadm upgrade apply v1.12.0 => Both kubeadm and k8s version is now v1.12.0
    - Now we can upgrade the kubelet
  - kubectl cordon $NODE && kubectl drain $NODE && apt-get upgrade
    - Will upgrade both the kubelet and its dropin file to v1.12.
Configuring any component with lots of knobs using CLI flags is far from optimal. Using a versioned file that supports different API versions, conversions between them and similar is far better. Hence, the community is striving towards adopting ComponentConfig for all components reference. kubeadm is already implementing ComponentConfig with its --config option, but our API group is at the time of writing still alpha.

Reference kubelet dropin created by the kubeadm deb package for v1.10.x

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS

Proposal

Let kubeadm the CLI flow down configuration to the kubelet via a well-versioned interface. This interface will make it possible to not rely on the dropin for configuring the kubelet by neither kubeadm the CLI nor the user. Both will use the kubelet's ComponentConfiguration to configure it. kubeadm will embed the v1beta1 ComponentConfiguration (aka KubeletConfiguration) of the kubelet inside of kubeadm's own Configuration file.

Reference kubelet dropin created by the kubeadm deb package for v1.11.x

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
EnvironmentFile=/etc/kubernetes/kubelet-cri.env #TBD
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_CONTAINER_RUNTIME_ARGS $KUBELET_RESOLVER_ARGS $KUBELET_EXTRA_ARGS

kubeadm init flow

kubeadm executes systemctl start kubelet if it's not already running. The kubelet starts crashlooping as it hasn't got the configuration or a KubeConfig file to talk to the API server.
kubeadm reads --config and unmarshals it to a struct internally if specified. If not specified, it sets a couple of well-known defaults.
kubeadm enforces a few security-related options for the KubeletConfiguration (e.g. securing the kubelet API endpoint, disabling the readonly and cAdvisor ports, etc.) in its defaulting
kubeadm the CLI generates certificates, kubeconfig files and static Pod manifests as usual.
kubeadm marshals the KubeletConfiguration and writes the bytes down to disk in the file /var/lib/kubelet/config.yaml.
- This only happens if kubelet --version is >= v1.11.0, when this feature was introduced.
The kubelet can now read its configuration and has a KubeConfig file for talking to the API server (/etc/kubernetes/kubelet.conf), so it can now start cleanly vs. crashlooping in systemd.
The static Pods are started, the API server comes up, the full kubeadm MasterConfiguration object is stored in the kubeadm-config ConfigMap in the cluster, etc.
kubeadm uploads the marshalled KubeletConfiguration bytes to a ConfigMap in the cluster called kubelet-config-1.X, where X is the minor version of the API Server running currently.
- This only happens if the API Server version is >= v1.11.0, when this feature was introduced.
kubeadm creates an RBAC rule that allows Bootstrap Tokens and all nodes in the cluster to read this ConfigMap.
kubeadm continues with the rest of the init flow as usual.

kubeadm join flow

kubeadm executes systemctl start kubelet if it's not already running. The kubelet starts crashlooping as it hasn't got the configuration or a KubeConfig file to talk to the API server.
kubeadm contacts the API server in some way and does the discovery flow (detailed in an other proposal). The net outcome of this process is a KubeConfig client to use for the TLS bootstrap. This KubeConfig object is written down to /etc/kubernetes/kubelet-bootstrap.conf.
In the usual flow the authentication method for the TLS bootstrap is using a Bootstrap Token. Thanks to creating that RBAC rule earlier, kubeadm uses the Bootstrap Token to download the desired KubeletConfiguration to /var/lib/kubelet/config.yaml.
- This only happens if kubelet --version is >= v1.11.0, when this feature was introduced.
The kubelet can now read its configuration and has a bootstrap KubeConfig file for talking to the API server (/etc/kubernetes/bootstrap-kubelet.conf), so it can now start cleanly vs. crashlooping in systemd.
- The kubelet performs the TLS bootstrap which yields it an unique clientcert credential stored in /etc/kubernetes/kubelet.conf.
The kubelet can now function normally.

v1.10 -> v1.11 upgrades

The user created a kubeadm cluster with kubeadm, kubelet and the API server at v1.10.2, but now wants to upgrade to v1.11.0. The v1.10.2 kubeadm kubelet dropin exists locally.
The dropin for v1.11.0 is different than v1.10.2 due to that now it references --config=/var/lib/kubelet/config.yaml
- Worth noting is that it would be different in any case, as we would have made changes to the dropin in any case, e.g. disabling the readonly port, enabling Token authentication to the kubelet API, etc.
We need to do the same flow as earlier, to download the kubeadm CLI independently and upgrade the API server first, and after that cordon, drain, and apt-get upgrade in order to upgrade the kubelet and get the new v1.11.0 dropin at the right time.
What does kubeadm upgrade apply v1.11.0 do?
- It first executes a set of preflight checks, etc. and then upgrades the control plane components in order
- It updates the kubeadm-config ConfigMap.
- It writes a new kubelet-config-1.11 ConfigMap with the desired configuration for v1.11.x kubelets.
- It also writes these v1.11-specific marshalled KubeletConfiguration bytes down to disk in the /var/lib/kubelet/config.yaml file, so the kubelet later can pick up the desired configuration when upgraded.
- Then the master node is upgraded successfully.
In order to upgrade the nodes, the following should be done:
- Cordon/drain from the master node as usual.
- kubeadm alpha phase kubelet write-config-to-disk to fetch the desired v1.11 configuration down to disk.
- apt-get upgrade which upgrades the kubeadm CLI binary and the kubelet itself

v1.11 -> upgrades

As the kubeadm-kubelet dropin file doesn't change anymore, we can skip the curl-ing down of the standalone kubeadm binary in the master upgrade flow, and instead tell the user to run apt-get install kubeadm=1.11.0-00
We don't have to worry about the kubeadm deb messing things up for the kubelet deb anymore.

Open questions

Should we embed the KubeletConfiguration in the kubeadm-config ConfigMap as well, or set it to nil before uploading as it's stored
What should the command to upgrade the kubelet? kubeadm upgrade node config?

PR to implement this is available to look at kubernetes/kubernetes#63887

The text was updated successfully, but these errors were encountered:

timothysc · 2018-05-16T15:49:43Z

I have comments written down for today which I'll discuss during the sig call.

tpepper · 2018-05-16T17:13:40Z

Is this going to go through https://github.com/kubernetes/features/blob/master/EXCEPTIONS.md since we're past feature freeze? Ie: content for this release was supposed to be defined, with implementations solidifying by the code freeze coming up. See https://github.com/kubernetes/sig-release/blob/master/releases/release-1.11/release-1.11.md

luxas · 2018-05-16T17:20:49Z

@tpepper I'd argue this goes under kubernetes/enhancements#356. This is about that configuration. If you strongly think an exception have to be made, I can handle the communications.
(Most of the text in the comment here is background information, not changes fwiw)

tpepper · 2018-05-16T19:22:35Z

@luxas I'm ok with that assessment though that feature then should be updated as it's currently v1.9 milestone

luxas · 2018-05-16T19:31:57Z

Yeah, sorry about that. Now fixed.

mtaufen · 2018-05-18T18:07:33Z

Hence, the community is striving towards adopting ComponentConfig for all components (TODO reference).

Reference:
https://docs.google.com/document/d/1FdaEJUEh091qf5B98HM6_8MS764iXrxxigNIdwHYW9c/edit

mtaufen · 2018-05-18T18:13:24Z

kubeadm will embed the v1beta1 ComponentConfiguration (aka KubeletConfiguration) of the kubelet inside of kubeadm's own Configuration file

Or just a path to a file that contains the Kubelet's component config yaml, which might make the beta to GA transition (or any scenario where multiple versions are available) easier to handle?

@justinsb

mtaufen · 2018-05-18T18:15:15Z

The kubelet can now read its configuration and has a KubeConfig file for talking to the API server (/etc/kubernetes/kubelet.conf), so it can now start cleanly vs. crashlooping in systemd.

Why wouldn't we just write the config prior to the first attempt to start the Kubelet?

mtaufen · 2018-05-18T18:16:50Z

kubeadm creates an RBAC rule that allows Bootstrap Tokens and all nodes in the cluster to read this ConfigMap.

I assume kubeadm has its own identity and is already authorized to create/read this ConfigMap?

mtaufen · 2018-05-18T18:22:18Z

It updates the kubeadm-config ConfigMap.

Why isn't the name of this ConfigMap versioned too?

luxas · 2018-05-22T18:11:35Z

Or just a path to a file that contains the Kubelet's component config yaml, which might make the beta to GA transition (or any scenario where multiple versions are available) easier to handle?

This is TBD still, but in v1alpha2 this is the approach we're gonna take. We need to enforce certain (security+other config) parameters in the kubelet ComponentConfig at this moment, which makes it easier for us right now to have them co-located (or the kubelet CC embedded if you like)

Why wouldn't we just write the config prior to the first attempt to start the Kubelet?

The user might/should start it before running kubeadm init. If it's not running, kubeadm will try to restart it using systemctl. But at that point neither the config or kubeconfig for the API server is there so it's crashlooping for a small amount of time while generating the certificates.

I assume kubeadm has its own identity and is already authorized to create/read this ConfigMap?

At node join time, the Bootstrap Token is the identity that kubeadm created for the user to use, which means we'll use this BT to download the config for the kubelet that is about to be joined to the cluster.

Why isn't the name of this ConfigMap versioned too?

It's a singleton, which describes the cluster's desired state (eventually, our config isn't that good yet 😄). Compared to that you have kubelets of all flavors (versions, configs, roles, etc)

@mtaufen

Automatic merge from submit-queue (batch tested with PRs 63914, 63887, 64116, 64026, 62933). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubeadm: Write kubelet config file to disk and persist in-cluster **What this PR does / why we need it**: In order to make configuration flow from the cluster level to node level, we need a way for kubeadm to tell the kubelet what config to use. As of v1.10 (I think) the kubelet can read `--config` using the kubelet Beta ComponentConfiguration API, so now we have an interface to talk to the kubelet properly. This PR: - Writes the kubelet ComponentConfig to `/var/lib/kubelet/config.yaml` on init and join - Writes an environment file to source in the kubelet systemd dropin `/var/lib/kubelet/kubeadm-flags.env`. This file contain runtime flags that should be passed to the kubelet. - Uploads a ConfigMap with the name `kubelet-config-1.X` - Patches the node object so that it starts using the ConfigMap with updates using Dynamic Kubelet Configuration, **only if the feature gate is set** (currently alpha and off by default, not intended to be switched on in v1.11) - Updates the phase commands to reflect this new flow The kubelet dropin file I used now looks like this: ``` # v1.11.x dropin as-is at HEAD # /etc/systemd/system/kubelet.service.d/10-kubeadm.conf --- [Service] Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf" Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml" EnvironmentFile-=/var/lib/kubelet/kubeadm-flags.env # Should default to 0 in v1.11: #63881, and hence not be here in the real v1.11 manifest Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0" # Should be configurable via the config file: #63878, and hence be configured using the file in v1.11 Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true" ExecStart= ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS --- # v1.11.x dropin end goal # /etc/systemd/system/kubelet.service.d/10-kubeadm.conf --- [Service] Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf" Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml" EnvironmentFile-=/var/lib/kubelet/kubeadm-flags.env ExecStart= ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS --- # Environment file dynamically created at runtime by "kubeadm init" # /var/lib/kubelet/kubeadm-flags.env KUBELET_KUBEADM_ARGS=--cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni ``` **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes kubernetes/kubeadm#822 Fixes kubernetes/kubeadm#571 **Special notes for your reviewer**: **Release note**: ```release-note "kubeadm init" now writes a structured and versioned kubelet ComponentConfiguration file to `/var/lib/kubelet/config.yaml` and an environment file with runtime flags (you can source this file in the systemd kubelet dropin) to `/var/lib/kubelet/kubeadm-flags.env`. ``` @kubernetes/sig-cluster-lifecycle-pr-reviews @mtaufen

mtaufen · 2018-05-23T01:21:24Z

The user might/should start it before running kubeadm init. If it's not running, kubeadm will try to restart it using systemctl. But at that point neither the config or kubeconfig for the API server is there so it's crashlooping for a small amount of time while generating the certificates.

Yeah, true there's always a few crashes while we wait for certs. I guess it depends on how long kubeadm takes and how much extra noise will be in the Kubelet logs as a result.

luxas · 2018-05-23T07:07:41Z

@mtaufen for the moment I think that's fine. However, we started thinking about the whether to actually embed the kubelet CC in our API types, so that's brought into the discussion again. Opened a follow-up for this in: #851

@liztio

Automatic merge from submit-queue (batch tested with PRs 64322, 64210, 64458, 64232, 64370). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubeadm: Move .NodeName and .CRISocket to a common sub-struct **What this PR does / why we need it**: Regroups some common fields for `kubeadm init` and `kubeadm join` only used for the initial node registration. Lets the user specify ExtraArgs to the kubelet. Now also runs the dynamic env file creation for `kubeadm join`. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes kubernetes/kubeadm#847 Follows-up kubernetes#63887 Related to kubernetes/kubeadm#822 **Special notes for your reviewer**: WIP, but please review so we can finalize the direction of the PR **Release note**: ```release-note [action required] `.NodeName` and `.CRISocket` in the `MasterConfiguration` and `NodeConfiguration` v1alpha1 API objects are now `.NodeRegistration.Name` and `.NodeRegistration.CRISocket` respectively in the v1alpha2 API. The `.NoTaintMaster` field has been removed in the v1alpha2 API. ``` @kubernetes/sig-cluster-lifecycle-pr-reviews @liztio

luxas · 2018-06-05T17:22:53Z

Reopening this momentarily to track the merge of the final PRs related to this

Automatic merge from submit-queue (batch tested with PRs 64009, 64780, 64354, 64727, 63650). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubeadm: Update the dropin for the kubelet in v1.11 **What this PR does / why we need it**: One of the final pieces of kubernetes/kubeadm#851, kubernetes/kubeadm#847 and kubernetes/kubeadm#822 **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: (partially) Fixes kubernetes/kubeadm#822 **Special notes for your reviewer**: Please check whether this release note makes sense to you. **Release note**: ```release-note [action required] The structure of the kubelet dropin in the kubeadm deb package has changed significantly. Instead of hard-coding the parameters for the kubelet in the dropin, a structured configuration file for the kubelet is used, and is expected to be present in `/var/lib/kubelet/config.yaml`. For runtime-detected, instance-specific configuration values, a environment file with dynamically-generated flags at `kubeadm init` or `kubeadm join` run time is used. Finally, if the user wants to override something specific for the kubelet that can't be done via the kubeadm Configuration file (which is preferred), they might add flags to the `KUBELET_EXTRA_ARGS` environment variable in either `/etc/default/kubelet` or `/etc/sysconfig/kubelet`, depending on the system you're running on. ``` @kubernetes/sig-cluster-lifecycle-pr-reviews

luxas assigned timothysc and luxas May 16, 2018

luxas added this to the v1.11 milestone May 16, 2018

luxas added kind/enhancement priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/releasing area/upgrades area/UX kind/refactor kind/feature Categorizes issue or PR as related to a new feature. labels May 16, 2018

neolit123 mentioned this issue May 17, 2018

document "controlling kubelet Config via a versioned file" at website #833

Closed

luxas mentioned this issue May 21, 2018

kubeadm: Write kubelet config file to disk and persist in-cluster kubernetes/kubernetes#63887

Merged

k8s-github-robot closed this as completed in kubernetes/kubernetes#63887 May 23, 2018

luxas mentioned this issue May 23, 2018

Create a KEP for the kubeadm-kubelet integration #851

Closed

luxas mentioned this issue May 23, 2018

kubeadm: Move .NodeName and .CRISocket to a common sub-struct kubernetes/kubernetes#64210

Merged

neolit123 mentioned this issue May 30, 2018

remove the docker cgroup driver detection #874

Closed

3 tasks

luxas mentioned this issue May 31, 2018

kubeadm uses deprecated kubelet flags #878

Closed

luxas reopened this Jun 5, 2018

luxas mentioned this issue Jun 5, 2018

kubeadm: Update the dropin for the kubelet in v1.11 kubernetes/kubernetes#64780

Merged

k8s-github-robot closed this as completed in kubernetes/kubernetes#64780 Jun 6, 2018

starnop mentioned this issue Sep 30, 2018

docs: update the doc of deploy kubernetes with PouchContainer AliyunContainerService/pouch#2294

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make kubeadm start controlling kubelet Config via a versioned file #822

Make kubeadm start controlling kubelet Config via a versioned file #822

luxas commented May 16, 2018 •

edited

Loading

timothysc commented May 16, 2018

tpepper commented May 16, 2018

luxas commented May 16, 2018 •

edited

Loading

tpepper commented May 16, 2018

luxas commented May 16, 2018

mtaufen commented May 18, 2018

mtaufen commented May 18, 2018

mtaufen commented May 18, 2018

mtaufen commented May 18, 2018

mtaufen commented May 18, 2018

luxas commented May 22, 2018

mtaufen commented May 23, 2018

luxas commented May 23, 2018

luxas commented Jun 5, 2018

Make kubeadm start controlling kubelet Config via a versioned file #822

Make kubeadm start controlling kubelet Config via a versioned file #822

Comments

luxas commented May 16, 2018 • edited Loading

Problem statement

Reference kubelet dropin created by the kubeadm deb package for v1.10.x

Proposal

Reference kubelet dropin created by the kubeadm deb package for v1.11.x

kubeadm init flow

kubeadm join flow

v1.10 -> v1.11 upgrades

v1.11 -> upgrades

Open questions

timothysc commented May 16, 2018

tpepper commented May 16, 2018

luxas commented May 16, 2018 • edited Loading

tpepper commented May 16, 2018

luxas commented May 16, 2018

mtaufen commented May 18, 2018

mtaufen commented May 18, 2018

mtaufen commented May 18, 2018

mtaufen commented May 18, 2018

mtaufen commented May 18, 2018

luxas commented May 22, 2018

mtaufen commented May 23, 2018

luxas commented May 23, 2018

luxas commented Jun 5, 2018

luxas commented May 16, 2018 •

edited

Loading

luxas commented May 16, 2018 •

edited

Loading