Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static pods #27

Open
miekg opened this issue Dec 22, 2020 · 10 comments
Open

static pods #27

miekg opened this issue Dec 22, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@miekg
Copy link
Collaborator

miekg commented Dec 22, 2020

Should do something like static pods? Can probably use k3s server for this, but something needs to mirror it in the api server, and systemk must do that.

Would simplify a bunch of things as we can just throw some yaml in a directory and stuff will happen.

@miekg miekg added the enhancement New feature or request label Dec 22, 2020
@miekg
Copy link
Collaborator Author

miekg commented Dec 24, 2020

If we name the unit right, systemk.... and we mirror the static pods in the API control plane, we can manage k3s server with kubecetl.
(needs to appropriate RBAC, but totally doable)

@miekg
Copy link
Collaborator Author

miekg commented Feb 9, 2021

Ok, you need to be a MirrorClient, https://github.com/kubernetes/kubernetes/blob/cea1d4e20b4a7886d8ff65f34c6d4f95efcb4742/pkg/kubelet/pod/mirror_client.go to be able to do this.

uhuh, this most def. seems to pull in kubernetes/kubernetes, but maybe we can copy it into our repo.

@miekg
Copy link
Collaborator Author

miekg commented Feb 9, 2021

A small problem I foresee is how to distinguish a static pod from a normal pod? I think it needs to be named differently, otherwise we will try to mirror actual pods (after a restart).

Yes, probably want to word static somewhere in the unit file. Where normal pods are named, systemk.<namespace>.<podname>.<container> , we could use systemk-static as the prefix.

So we start up and if we see unit names with this prefix it's considered a static pod and we create the pod in the api-server (if allowed).

@miekg miekg changed the title static pods? static pods Feb 9, 2021
@miekg
Copy link
Collaborator Author

miekg commented Feb 9, 2021

another thing to note is that we don't care how these units come into existence, it just that they are there and match the naming for static pods

@miekg
Copy link
Collaborator Author

miekg commented Feb 10, 2021

https://twitter.com/miekg/status/1359436562476449795
looks like I'm not alone in wishing to pivot from static pods into a deployment. Kubeadm has some early support for this.

@miekg
Copy link
Collaborator Author

miekg commented Feb 11, 2021

So doing this from the viewpoint of systemk is hard and fragile. What can be done is the following:
You have a sytemd unit running (normally named, nothing fancy here). It has a [X-Kubernetes] section just like the other units, but in this one we have:

StaticPod=true
Annotation=blah

so for these pods systemk will try to mirror in the API (if allowed) and further more watch (namespace? pods?) for the annotation blah to show up, if it does it will kill the static pod.
(it's assumed that whatever is now deployed with that annotation will take over the functionality of this pod).

So for bootstrapping with e.g. step-ca it will be provide a unit to start via cloud-init or whatever. Have everything bootstrapped out of this and then fold step-ca into k8s by deploying it's in-cluster successor.

Note this sidesteps a lot of thorny problems, it basically boils down to starting/tearing down a static pod when instructed to do so. How state is transferred from the static pod to the deployment is left out. But a hostPath and running on the same machine is something that could be documented.

@miekg
Copy link
Collaborator Author

miekg commented Jun 21, 2021

So, I'm looking into this again. and I'm taking a normal systemd managed unit as an example. We basically want to turn this:

[Unit]
Description=CoreDNS DNS server
Documentation=https://coredns.io
After=network.target

[Service]
PermissionsStartOnly=true
LimitNOFILE=1048576
LimitNPROC=512
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
NoNewPrivileges=true
User=coredns
WorkingDirectory=~
ExecStart=/usr/bin/coredns -conf=/etc/coredns/Corefile
ExecReload=/bin/kill -SIGUSR1 $MAINPID
Restart=on-failure

[Install]
WantedBy=multi-user.target

into this (never mind it's diff. binary, that's a detail we don't care about here):

[Unit]
Description=systemk
Documentation=man:systemk(8)

[Install]
WantedBy=multi-user.target

[Service]
ProtectSystem=true
ProtectHome=tmpfs
PrivateMounts=true
ReadOnlyPaths=/
StandardOutput=journal
StandardError=journal
RemainAfterExit=true
ExecStart=/usr/bin/bash -c "while true; do date;  sleep 1; done"
TemporaryFileSystem=/var /run
BindReadOnlyPaths=/var/run/da3ae833-8642-4102-b025-aaacb06db804/secrets/#0:/var/run/secrets/kubernetes.io/serviceaccount
Environment=HOSTNAME=draak
Environment=KUBERNETES_SERVICE_PORT=6444
Environment=KUBERNETES_SERVICE_HOST=127.0.0.1
Environment=SYSTEMK_NODE_INTERNAL_IP=192.168.86.22
Environment=SYSTEMK_NODE_EXTERNAL_IP=192.168.86.22
Environment=KUBERNETES_PORT_443_TCP_PROTO="tcp"
Environment=KUBERNETES_PORT_443_TCP_PORT="443"
Environment=KUBERNETES_PORT_443_TCP_ADDR="10.43.0.1"
Environment=KUBERNETES_SERVICE_HOST="10.43.0.1"
Environment=KUBERNETES_SERVICE_PORT="443"
Environment=KUBERNETES_SERVICE_PORT_HTTPS="443"
Environment=KUBERNETES_PORT="tcp://10.43.0.1:443"
Environment=KUBERNETES_PORT_443_TCP="tcp://10.43.0.1:443"

[X-Kubernetes]
Namespace=default
ClusterName=
Id=da3ae833-8642-4102-b025-aaacb06db804
Image=bash

Needless to say, that's impossible.

Another idea is to be able to create a pod the normal way and then replace a systemd managed unit (maybe with an annotation?), then we don't have to do anything fancy to convert unit files. The steps would then become:

  1. start pod w/ annotation (on specific machine?)
  2. check annotation, stop and disable the unit
  3. start the pod
  4. done

error handling might be trickier, but on pod creation failure we may restart the original unit? Or maybe we should not care and let the cluster deal with this outage.

This does imply systemk has access to the systemd system unit files on the host.

Keep in mind the end goal here is to have everything systemk managed, including the bootstrap bits that are need to get the cluster up and running.

Stopping a system service and taking it over with one started from the k8s control plane opens a security hole: you can now stop any system service - this probably not what you want. We could add a X-Kubernetes like section to the original unit, saying it's allowed to be interrupted... Or have same back reference in there for some rbac rule or some such.

Note: systemk already accessed the system systemd, so it can pretty much do what it wants.

@miekg
Copy link
Collaborator Author

miekg commented Jun 21, 2021

We can just add a

[X-Kubernetes]
ClusterRole=<xxx>

Where the current user wielding kubectl will need to be allowed to delete the resource units in the api group systemk.io (or whatever).
I only need to know the current user, which should be in the current context?

@miekg
Copy link
Collaborator Author

miekg commented Jun 21, 2021

This won't work, the kubelet doesn't have any concept of who called "into" it. Any validation on objects has already happened. And a unit on a disk somewhere isn't a k8s object. We can make it into one, but then we have 2 things: createPod and some k8s object with RBAC protection. Such a CRD should then be almost like a pod object so we know what to do??

That's looks to get messy quickly...

@miekg
Copy link
Collaborator Author

miekg commented Jul 22, 2021

ok, this can work, but the unit needs to tell systemk what kind of objects there need to be created in the API. Easiest would be to include the yaml in the unit file. Specify a new section and list the yaml as base64 encoded lines in there.

Upon starting up and seeing a specially named unit with this section it will setup these objects in the API. Possible things that can go wrong:

  • the original unit will be stopped/disabled, this is a permanent change to the system it's running on
  • the k8s api might reject the new pod(s), or they fail to start. This means restarting the original unit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant