Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kubeadm start controlling kubelet Config via a versioned file #822

Closed
luxas opened this issue May 16, 2018 · 14 comments · Fixed by kubernetes/kubernetes#63887 or kubernetes/kubernetes#64780
Assignees
Labels
area/releasing area/upgrades area/UX kind/feature Categorizes issue or PR as related to a new feature. kind/refactor priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@luxas
Copy link
Member

luxas commented May 16, 2018

Problem statement

Currently kubeadm (the CLI) assumes a running (or actually, crashlooping) kubelet, with the right configuration for the right kubeadm version. This is currently achieved by packaging a kubelet dropin file together with the kubeadm debian package. So we have two different kubeadm entities with different responsibilities: the kubeadm CLI binary and the kubeadm deb package.

This solutions includes a number of problems:

  • There is no way for kubeadm the CLI to give kubelet instructions how to run
    • Instead, we have to tell the user to go and modify the dropin manually which is time consuming and error prone
  • There is no way for the cluster administrator to enforce a base kubelet configuration policy for all nodes, i.e. set the DNS domain to cluster.global(silly example but anyway)
  • Upgrading is particularly difficult, as the kubeadm deb package bundles both the kubeadm CLI and the kubelet dropin. Why is this a problem when upgrading? Consider the following scenario:
    • You have kubeadm v1.11.0 locally, together with k8s at v1.11.0 and kubelets of v1.11.0
    • You want to upgrade to v1.12.
    • Between v1.11.x and v1.12.x, the kubelet dropin changed, as it had to. Let's say we set the --cadvisor-port=0flag to be secure in v1.11, but Kubernetes evolved to remove the flag in v1.12 and always turn it off. In other words, if we tried to set --cadvisor-port=0 v1.12 the kubelet would crash.
      • This means, if we use the v1.12.x dropin for a v1.11.x kubelet we're insecure, and if we're using a v1.11.x dropin for a v1.12.x kubelet it'll crash
    • First thing, the user needs kubeadm v1.12.0 to upgrade k8s to v1.12.0, but the user can't do apt-get upgrade as that would download a newer kubelet than the API server, which is not supported (TODO reference).
    • But we can't do apt-get install kubeadm to only upgrade the kubeadm deb package either, as that would include a newer manifest than supported.
    • Current solution:
      • Download kubeadm the CLI only as a standalone binary using curl.
      • kubeadm upgrade apply v1.12.0 => Both kubeadm and k8s version is now v1.12.0
        • Now we can upgrade the kubelet
      • kubectl cordon $NODE && kubectl drain $NODE && apt-get upgrade
        • Will upgrade both the kubelet and its dropin file to v1.12.
  • Configuring any component with lots of knobs using CLI flags is far from optimal. Using a versioned file that supports different API versions, conversions between them and similar is far better. Hence, the community is striving towards adopting ComponentConfig for all components reference. kubeadm is already implementing ComponentConfig with its --config option, but our API group is at the time of writing still alpha.

Reference kubelet dropin created by the kubeadm deb package for v1.10.x

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.96.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt"
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true --cert-dir=/var/lib/kubelet/pki"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS

Proposal

Let kubeadm the CLI flow down configuration to the kubelet via a well-versioned interface. This interface will make it possible to not rely on the dropin for configuring the kubelet by neither kubeadm the CLI nor the user. Both will use the kubelet's ComponentConfiguration to configure it. kubeadm will embed the v1beta1 ComponentConfiguration (aka KubeletConfiguration) of the kubelet inside of kubeadm's own Configuration file.

Reference kubelet dropin created by the kubeadm deb package for v1.11.x

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
EnvironmentFile=/etc/kubernetes/kubelet-cri.env #TBD
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_CONTAINER_RUNTIME_ARGS $KUBELET_RESOLVER_ARGS $KUBELET_EXTRA_ARGS

kubeadm init flow

  • kubeadm executes systemctl start kubelet if it's not already running. The kubelet starts crashlooping as it hasn't got the configuration or a KubeConfig file to talk to the API server.
  • kubeadm reads --config and unmarshals it to a struct internally if specified. If not specified, it sets a couple of well-known defaults.
  • kubeadm enforces a few security-related options for the KubeletConfiguration (e.g. securing the kubelet API endpoint, disabling the readonly and cAdvisor ports, etc.) in its defaulting
  • kubeadm the CLI generates certificates, kubeconfig files and static Pod manifests as usual.
  • kubeadm marshals the KubeletConfiguration and writes the bytes down to disk in the file /var/lib/kubelet/config.yaml.
    • This only happens if kubelet --version is >= v1.11.0, when this feature was introduced.
  • The kubelet can now read its configuration and has a KubeConfig file for talking to the API server (/etc/kubernetes/kubelet.conf), so it can now start cleanly vs. crashlooping in systemd.
  • The static Pods are started, the API server comes up, the full kubeadm MasterConfiguration object is stored in the kubeadm-config ConfigMap in the cluster, etc.
  • kubeadm uploads the marshalled KubeletConfiguration bytes to a ConfigMap in the cluster called kubelet-config-1.X, where X is the minor version of the API Server running currently.
    • This only happens if the API Server version is >= v1.11.0, when this feature was introduced.
  • kubeadm creates an RBAC rule that allows Bootstrap Tokens and all nodes in the cluster to read this ConfigMap.
  • kubeadm continues with the rest of the init flow as usual.

kubeadm join flow

  • kubeadm executes systemctl start kubelet if it's not already running. The kubelet starts crashlooping as it hasn't got the configuration or a KubeConfig file to talk to the API server.
  • kubeadm contacts the API server in some way and does the discovery flow (detailed in an other proposal). The net outcome of this process is a KubeConfig client to use for the TLS bootstrap. This KubeConfig object is written down to /etc/kubernetes/kubelet-bootstrap.conf.
  • In the usual flow the authentication method for the TLS bootstrap is using a Bootstrap Token. Thanks to creating that RBAC rule earlier, kubeadm uses the Bootstrap Token to download the desired KubeletConfiguration to /var/lib/kubelet/config.yaml.
    • This only happens if kubelet --version is >= v1.11.0, when this feature was introduced.
  • The kubelet can now read its configuration and has a bootstrap KubeConfig file for talking to the API server (/etc/kubernetes/bootstrap-kubelet.conf), so it can now start cleanly vs. crashlooping in systemd.
    • The kubelet performs the TLS bootstrap which yields it an unique clientcert credential stored in /etc/kubernetes/kubelet.conf.
  • The kubelet can now function normally.

v1.10 -> v1.11 upgrades

  • The user created a kubeadm cluster with kubeadm, kubelet and the API server at v1.10.2, but now wants to upgrade to v1.11.0. The v1.10.2 kubeadm kubelet dropin exists locally.
  • The dropin for v1.11.0 is different than v1.10.2 due to that now it references --config=/var/lib/kubelet/config.yaml
    • Worth noting is that it would be different in any case, as we would have made changes to the dropin in any case, e.g. disabling the readonly port, enabling Token authentication to the kubelet API, etc.
  • We need to do the same flow as earlier, to download the kubeadm CLI independently and upgrade the API server first, and after that cordon, drain, and apt-get upgrade in order to upgrade the kubelet and get the new v1.11.0 dropin at the right time.
  • What does kubeadm upgrade apply v1.11.0 do?
    • It first executes a set of preflight checks, etc. and then upgrades the control plane components in order
    • It updates the kubeadm-config ConfigMap.
    • It writes a new kubelet-config-1.11 ConfigMap with the desired configuration for v1.11.x kubelets.
    • It also writes these v1.11-specific marshalled KubeletConfiguration bytes down to disk in the /var/lib/kubelet/config.yaml file, so the kubelet later can pick up the desired configuration when upgraded.
    • Then the master node is upgraded successfully.
  • In order to upgrade the nodes, the following should be done:
    • Cordon/drain from the master node as usual.
    • kubeadm alpha phase kubelet write-config-to-disk to fetch the desired v1.11 configuration down to disk.
    • apt-get upgrade which upgrades the kubeadm CLI binary and the kubelet itself

v1.11 -> upgrades

  • As the kubeadm-kubelet dropin file doesn't change anymore, we can skip the curl-ing down of the standalone kubeadm binary in the master upgrade flow, and instead tell the user to run apt-get install kubeadm=1.11.0-00
  • We don't have to worry about the kubeadm deb messing things up for the kubelet deb anymore.

Open questions

  • Should we embed the KubeletConfiguration in the kubeadm-config ConfigMap as well, or set it to nil before uploading as it's stored
  • What should the command to upgrade the kubelet? kubeadm upgrade node config?

PR to implement this is available to look at kubernetes/kubernetes#63887

@luxas luxas added this to the v1.11 milestone May 16, 2018
@luxas luxas added kind/enhancement priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/releasing area/upgrades area/UX kind/refactor kind/feature Categorizes issue or PR as related to a new feature. labels May 16, 2018
@timothysc
Copy link
Member

I have comments written down for today which I'll discuss during the sig call.

@tpepper
Copy link
Member

tpepper commented May 16, 2018

Is this going to go through https://github.com/kubernetes/features/blob/master/EXCEPTIONS.md since we're past feature freeze? Ie: content for this release was supposed to be defined, with implementations solidifying by the code freeze coming up. See https://github.com/kubernetes/sig-release/blob/master/releases/release-1.11/release-1.11.md

@luxas
Copy link
Member Author

luxas commented May 16, 2018

@tpepper I'd argue this goes under kubernetes/enhancements#356. This is about that configuration. If you strongly think an exception have to be made, I can handle the communications.
(Most of the text in the comment here is background information, not changes fwiw)

@tpepper
Copy link
Member

tpepper commented May 16, 2018

@luxas I'm ok with that assessment though that feature then should be updated as it's currently v1.9 milestone

@luxas
Copy link
Member Author

luxas commented May 16, 2018

Yeah, sorry about that. Now fixed.

@mtaufen
Copy link

mtaufen commented May 18, 2018

Hence, the community is striving towards adopting ComponentConfig for all components (TODO reference).

Reference:
https://docs.google.com/document/d/1FdaEJUEh091qf5B98HM6_8MS764iXrxxigNIdwHYW9c/edit

@mtaufen
Copy link

mtaufen commented May 18, 2018

kubeadm will embed the v1beta1 ComponentConfiguration (aka KubeletConfiguration) of the kubelet inside of kubeadm's own Configuration file

Or just a path to a file that contains the Kubelet's component config yaml, which might make the beta to GA transition (or any scenario where multiple versions are available) easier to handle?

@justinsb

@mtaufen
Copy link

mtaufen commented May 18, 2018

The kubelet can now read its configuration and has a KubeConfig file for talking to the API server (/etc/kubernetes/kubelet.conf), so it can now start cleanly vs. crashlooping in systemd.

Why wouldn't we just write the config prior to the first attempt to start the Kubelet?

@mtaufen
Copy link

mtaufen commented May 18, 2018

kubeadm creates an RBAC rule that allows Bootstrap Tokens and all nodes in the cluster to read this ConfigMap.

I assume kubeadm has its own identity and is already authorized to create/read this ConfigMap?

@mtaufen
Copy link

mtaufen commented May 18, 2018

It updates the kubeadm-config ConfigMap.

Why isn't the name of this ConfigMap versioned too?

@luxas
Copy link
Member Author

luxas commented May 22, 2018

Or just a path to a file that contains the Kubelet's component config yaml, which might make the beta to GA transition (or any scenario where multiple versions are available) easier to handle?

This is TBD still, but in v1alpha2 this is the approach we're gonna take. We need to enforce certain (security+other config) parameters in the kubelet ComponentConfig at this moment, which makes it easier for us right now to have them co-located (or the kubelet CC embedded if you like)

Why wouldn't we just write the config prior to the first attempt to start the Kubelet?

The user might/should start it before running kubeadm init. If it's not running, kubeadm will try to restart it using systemctl. But at that point neither the config or kubeconfig for the API server is there so it's crashlooping for a small amount of time while generating the certificates.

I assume kubeadm has its own identity and is already authorized to create/read this ConfigMap?

At node join time, the Bootstrap Token is the identity that kubeadm created for the user to use, which means we'll use this BT to download the config for the kubelet that is about to be joined to the cluster.

Why isn't the name of this ConfigMap versioned too?

It's a singleton, which describes the cluster's desired state (eventually, our config isn't that good yet 😄). Compared to that you have kubelets of all flavors (versions, configs, roles, etc)

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue May 23, 2018
Automatic merge from submit-queue (batch tested with PRs 63914, 63887, 64116, 64026, 62933). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubeadm: Write kubelet config file to disk and persist in-cluster

**What this PR does / why we need it**:
In order to make configuration flow from the cluster level to node level, we need a way for kubeadm to tell the kubelet what config to use. As of v1.10 (I think) the kubelet can read `--config` using the kubelet Beta ComponentConfiguration API, so now we have an interface to talk to the kubelet properly.

This PR:
 - Writes the kubelet ComponentConfig to `/var/lib/kubelet/config.yaml` on init and join
 - Writes an environment file to source in the kubelet systemd dropin `/var/lib/kubelet/kubeadm-flags.env`. This file contain runtime flags that should be passed to the kubelet.
 - Uploads a ConfigMap with the name `kubelet-config-1.X`
 - Patches the node object so that it starts using the ConfigMap with updates using Dynamic Kubelet Configuration, **only if the feature gate is set** (currently alpha and off by default, not intended to be switched on in v1.11)
 - Updates the phase commands to reflect this new flow

The kubelet dropin file I used now looks like this:
```
# v1.11.x dropin as-is at HEAD
# /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
---
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
EnvironmentFile-=/var/lib/kubelet/kubeadm-flags.env
# Should default to 0 in v1.11: #63881, and hence not be here in the real v1.11 manifest
Environment="KUBELET_CADVISOR_ARGS=--cadvisor-port=0"
# Should be configurable via the config file: #63878, and hence be configured using the file in v1.11
Environment="KUBELET_CERTIFICATE_ARGS=--rotate-certificates=true"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS
---
# v1.11.x dropin end goal
# /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
---
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
EnvironmentFile-=/var/lib/kubelet/kubeadm-flags.env
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
---
# Environment file dynamically created at runtime by "kubeadm init"
# /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS=--cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni
```

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/kubeadm#822
Fixes kubernetes/kubeadm#571

**Special notes for your reviewer**:

**Release note**:

```release-note
"kubeadm init" now writes a structured and versioned kubelet ComponentConfiguration file to `/var/lib/kubelet/config.yaml` and an environment file with runtime flags (you can source this file in the systemd kubelet dropin) to `/var/lib/kubelet/kubeadm-flags.env`.
```
@kubernetes/sig-cluster-lifecycle-pr-reviews @mtaufen
@mtaufen
Copy link

mtaufen commented May 23, 2018

The user might/should start it before running kubeadm init. If it's not running, kubeadm will try to restart it using systemctl. But at that point neither the config or kubeconfig for the API server is there so it's crashlooping for a small amount of time while generating the certificates.

Yeah, true there's always a few crashes while we wait for certs. I guess it depends on how long kubeadm takes and how much extra noise will be in the Kubelet logs as a result.

@luxas
Copy link
Member Author

luxas commented May 23, 2018

@mtaufen for the moment I think that's fine. However, we started thinking about the whether to actually embed the kubelet CC in our API types, so that's brought into the discussion again. Opened a follow-up for this in: #851

krzyzacy pushed a commit to krzyzacy/kubernetes that referenced this issue May 30, 2018
Automatic merge from submit-queue (batch tested with PRs 64322, 64210, 64458, 64232, 64370). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubeadm: Move .NodeName and .CRISocket to a common sub-struct

**What this PR does / why we need it**:
Regroups some common fields for `kubeadm init` and `kubeadm join` only used for the initial node registration.
Lets the user specify ExtraArgs to the kubelet.
Now also runs the dynamic env file creation for `kubeadm join`.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/kubeadm#847
Follows-up kubernetes#63887
Related to kubernetes/kubeadm#822

**Special notes for your reviewer**: WIP, but please review so we can finalize the direction of the PR

**Release note**:

```release-note
[action required] `.NodeName` and `.CRISocket` in the `MasterConfiguration` and `NodeConfiguration` v1alpha1 API objects are now `.NodeRegistration.Name` and `.NodeRegistration.CRISocket` respectively in the v1alpha2 API. The `.NoTaintMaster` field has been removed in the v1alpha2 API.
```
@kubernetes/sig-cluster-lifecycle-pr-reviews @liztio
@luxas
Copy link
Member Author

luxas commented Jun 5, 2018

Reopening this momentarily to track the merge of the final PRs related to this

@luxas luxas reopened this Jun 5, 2018
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Jun 6, 2018
Automatic merge from submit-queue (batch tested with PRs 64009, 64780, 64354, 64727, 63650). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubeadm: Update the dropin for the kubelet in v1.11

**What this PR does / why we need it**:
One of the final pieces of kubernetes/kubeadm#851, kubernetes/kubeadm#847 and kubernetes/kubeadm#822

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
(partially)
Fixes kubernetes/kubeadm#822

**Special notes for your reviewer**: Please check whether this release note makes sense to you.

**Release note**:

```release-note
[action required] The structure of the kubelet dropin in the kubeadm deb package has changed significantly.
Instead of hard-coding the parameters for the kubelet in the dropin, a structured configuration file
for the kubelet is used, and is expected to be present in `/var/lib/kubelet/config.yaml`.
For runtime-detected, instance-specific configuration values, a environment file with
dynamically-generated flags at `kubeadm init` or `kubeadm join` run time is used.
Finally, if the user wants to override something specific for the kubelet that can't be done via
the kubeadm Configuration file (which is preferred), they might add flags to the 
`KUBELET_EXTRA_ARGS` environment variable in either `/etc/default/kubelet`
or `/etc/sysconfig/kubelet`, depending on the system you're running on.
```
@kubernetes/sig-cluster-lifecycle-pr-reviews
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/releasing area/upgrades area/UX kind/feature Categorizes issue or PR as related to a new feature. kind/refactor priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
4 participants