Knative on CoCo (sc2-sys#12)

konsougiou · Sep 28, 2023 · 81614ab · 81614ab
1 parent 40cca98
commit 81614ab
Show file tree

Hide file tree

Showing 17 changed files with 433 additions and 79 deletions.
diff --git a/README.md b/README.md
@@ -50,6 +50,15 @@ inv operator.install
 inv operator.install-cc-runtime
 ```
 
+Third, update the `initrd` file to include our patched `kata-agent`:
+
+```bash
+inv kata.replace-agent
+```
+
+if it is the first time, you will have to manually build the agent following
+[these instructions](./docs/kata.md#replacing-the-kata-agent).
+
 Then, you are ready to run one of the supported apps:
 * [Hello World! (Py)](./docs/helloworld_py.md) - simple HTTP server running in Python to test CoCo and Kata.
 * [Hello World! (Knative)](./docs/helloworld_knative.md) - same app as before, but invoked over Knatvie.
@@ -79,5 +88,7 @@ inv kubeadm.destroy
 
 For further documentation, you may want to check these other documents:
 * [K8s](./docs/k8s.md) - documentation about configuring a single-node Kubernetes cluster.
+* [Kata](./docs/kata.md) - instructions to build our custom Kata fork and `initrd` images.
 * [Knative](./docs/knative.md) - documentation about Knative, our serverless runtime of choice.
 * [SEV](./docs/sev.md) - speicifc documentation to get the project working with AMD SEV machines.
+* [Troubleshooting](./docs/troubleshooting.md) - tips to debug when things go sideways.
diff --git a/apps/helloworld-knative/service.yaml b/apps/helloworld-knative/service.yaml
@@ -2,25 +2,21 @@ apiVersion: serving.knative.dev/v1
 kind: Service
 metadata:
   name: helloworld-knative
-    #   annotations:
-    #     "features.knative.dev/podspec-volumes-emptydir": "enabled"
-    #     "features.knative.dev/podspec-persistent-volume-claim": "enabled"
-    #     "features.knative.dev/podspec-persistent-volume-claim-write": "enabled"
-    #     "features.knative.dev/podspec-runtimeclassname": "enabled"
+  annotations:
+    "features.knative.dev/podspec-runtimeclassname": "enabled"
 spec:
-  # ConfigurationSpec (or RevisionTemplateSpec?)
   template:
     metadata:
       labels:
         apps.coco-serverless/name: helloworld-py
-          # io.katacontainers.config.pre_attestation.enabled: "false"
+        io.katacontainers.config.pre_attestation.enabled: "false"
     spec:
-      # runtimeClassName: kata-qemu
+      runtimeClassName: kata-qemu-sev
+      # coco-knative: need to run user container as root
+      securityContext:
+        runAsUser: 1000
       containers:
-        # - image: ghcr.io/knative/helloworld-go:latest
-        # - image: csegarragonz/coco-helloworld-py:latest
-        # - image: csegarragonz/coco-helloworld-py:latest
-        - image: csegarragonz/coco-helloworld-py@sha256:af0fec55e9aed9a259e8da9dcaa28ab3fc1277dc8db4b8883265f98272cef11d
+        - image: csegarragonz/coco-helloworld-py:latest
           ports:
             - containerPort: 8080
           env:

diff --git a/apps/helloworld-py/deployment.yaml b/apps/helloworld-py/deployment.yaml
@@ -19,5 +19,6 @@ spec:
       containers:
       - name: helloworld-py
         image: csegarragonz/coco-helloworld-py:latest
+          imagePullPolicy: Always
         ports:
         - containerPort: 8080
diff --git a/conf-files/knative_config.yaml b/conf-files/knative_config.yaml
@@ -4,9 +4,5 @@ metadata:
   name:  config-features
   namespace:  knative-serving
 data:
-  kubernetes.podspec-volumes-emptydir: "enabled"
-  kubernetes.podspec-persistent-volume-claim: "enabled"
-  kubernetes.podspec-persistent-volume-claim-write: "enabled"
   kubernetes.podspec-runtimeclassname: "enabled"
-  kubernetes.containerspec-addcapabilities: "enabled"
-  registries-skipping-tag-resolving: docker.io
+  kubernetes.podspec-securitycontext: "enabled"
diff --git a/docs/helloworld_knative.md b/docs/helloworld_knative.md
@@ -3,6 +3,13 @@
 This application runs the same `Hello World!` sample than [`helloworld-py`](
 ./helloworld_py.md), but through Knative Serving.
 
+This sample application does not use any attestation or image encryption, so
+you should disable it by running:
+
+```bash
+inv coco.disable-attestation
+```
+
 To deploy it, you may run:
 
 ```bash
@@ -25,3 +32,19 @@ To remove the application, you can run:
 ```bash
 kubectl delete -f ./apps/helloworld-knative
 ```
+
+## Knative on CoCo
+
+For the time being, CoCo requires the image to _always_ be pulled on the guest.
+If the image is present on the host, Knative will try to cache it (as it is
+not possible to specify `imagePullPolicy: Always`), and the pod won't start
+complaining about problems mounting the root file-system.
+
+To remove the image from the host's cache, you can use `crictl`:
+
+```bash
+sudo crictl rmi <image_id>
+```
+
+note that, if _only_ using CoCo, the images are _never_ on the host, so they
+should never be cached.
diff --git a/docs/kata.md b/docs/kata.md
@@ -0,0 +1,46 @@
+# Kata Containers
+
+Most of the Kata development happens in our [Kata fork](
+https://github.com/csegarragonz/kata-containers). The reason why we use a fork
+is to pin to an older, but stable, CC release, and add patches on top when
+necessary. Down the road (and particularly when CoCo uses Kata's main), we'd
+get rid of the fork.
+
+## Tweaking Kata
+
+To get a working environment to modify Kata, clone our fork and build/exec into
+the workon container. For convenience, it is recommended to clone the fork at
+the same directory level that this repo lives (i.e. ../kata-containers).
+
+```bash
+git clone https://github.com/csegarragonz/kata-containers
+cd kata-containers
+./csg-bin/build_docker.sh
+./csg-bin/cli.sh
+```
+
+## Replacing the Kata Agent
+
+Replacing the Kata Agent is something we may do regularly, and is a fairly
+automated process.
+
+First, from our Kata fork, rebuild the `kata-agent` binary:
+
+```bash
+cd ../kata-containers
+./csg-bin/cli.sh
+cd src/agent
+make
+exit
+cd -
+```
+
+Second, from this repository, bake the new agent into the `initrd` image used
+by `qemu-sev` and update the config path:
+
+```bash
+inv kata.replace-agent
+```
+
+The new VMs you start should use the new `initrd` (and thus the updated
+`kata-agent`).
diff --git a/docs/knative.md b/docs/knative.md
@@ -28,3 +28,9 @@ inv kubeadm.destroy
 inv kubeadm.create
 inv knative.install
 ```
+
+## Knative on CoCo
+
+To run Knative on CoCo, we need to enable two feature flags when configuring
+Knative. Check out the [`ConfigMap`](../conf-files/knative_config.yaml) for
+more details.
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
@@ -0,0 +1,55 @@
+# Troubleshooting
+
+In this document we include a collection of tips to help you debug the system
+in case something is not working as expected.
+
+## K8s Monitoring with K9s
+
+Gaining visibility into the state of a Kubernetes cluster is hard. Thus we can
+not stress enough how useful `k9s` is to debug what is going on.
+
+We strongly recommend you using it, you may install it with:
+
+```bash
+inv k9s.install
+export KUBECONFIG=$(pwd)/.config/kubeadm_kubeconfig
+k9s
+```
+
+## Enabling debug logging in the system journal
+
+Another good observability tool are the journal logs. Both `containerd` and
+`kata-agent` send logs to the former's systemd journal log. You may inspect
+the logs using:
+
+```bash
+sudo journalctl -xeu containerd
+```
+
+To enable debug logging you may run:
+
+```bash
+inv containerd.set-log-level [debug,info]
+inv kata.set-log-level [debug,info]
+```
+
+naturally, run the commands again with `info` to reset the original log level.
+
+## Nuking the whole cluster
+
+When things really go wrong, resetting the whole cluster is usually a good way
+to get a clean start:
+
+```bash
+inv kubeadm.destroy kubeadm.create
+```
+
+If you want a really clean start, you can re-install cotnainerd and all the
+`k8s` tooling:
+
+```bash
+inv kubeadm.destroy
+inv containerd.build containerd.install
+inv k8s.install --clean
+inv kubeadm.create
+```
diff --git a/docs/uk8s.md b/docs/uk8s.md
diff --git a/tasks/__init__.py b/tasks/__init__.py
@@ -1,21 +1,25 @@
 from invoke import Collection
 
 from . import apps
+from . import coco
 from . import containerd
 from . import format_code
 from . import k8s
 from . import k9s
+from . import kata
 from . import kbs
 from . import knative
 from . import kubeadm
 from . import operator
 
 ns = Collection(
     apps,
+    coco,
     containerd,
     format_code,
     k8s,
     k9s,
+    kata,
     kbs,
     knative,
     kubeadm,

diff --git a/tasks/coco.py b/tasks/coco.py
@@ -0,0 +1,17 @@
+from invoke import task
+from os.path import join
+from tasks.util.env import KATA_CONFIG_DIR
+from tasks.util.toml import update_toml
+
+
+@task
+def disable_attestation(ctx):
+    """
+    Disable attestation for CoCo
+    """
+    conf_file_path = join(KATA_CONFIG_DIR, "configuration-qemu-sev.toml")
+    updated_toml_str = """
+    [hypervisor.qemu]
+    guest_pre_attestation = false
+    """
+    update_toml(conf_file_path, updated_toml_str)
diff --git a/tasks/containerd.py b/tasks/containerd.py
@@ -3,13 +3,20 @@
 from os.path import join
 from subprocess import CalledProcessError, run
 from tasks.util.env import CONF_FILES_DIR, PROJ_ROOT
-from toml import load as toml_load, dump as toml_dump
+from tasks.util.toml import update_toml
 
 CONTAINERD_IMAGE_TAG = "containerd-build"
 CONTAINERD_SOURCE_CHECKOUT = join(PROJ_ROOT, "..", "containerd")
 CONTAINERD_CONFIG_FILE = "/etc/containerd/config.toml"
 
 
+def restart_containerd():
+    """
+    Utility function to gracefully restart the containerd service
+    """
+    run("sudo service containerd restart", shell=True, check=True)
+
+
 @task
 def build(ctx):
     """
@@ -47,6 +54,10 @@ def configure_devmapper_snapshotter():
     data_dir = "/var/lib/containerd/devmapper"
     pool_name = "containerd-pool"
 
+    # --------------------------
+    # Thin Pool device configuration
+    # --------------------------
+
     # First, remove the device if it already exists
     try:
         run("sudo dmsetup remove --force {}".format(pool_name), shell=True, check=True)
@@ -113,24 +124,47 @@ def configure_devmapper_snapshotter():
     dmsetup_cmd = " ".join(dmsetup_cmd)
     run(dmsetup_cmd, shell=True, check=True)
 
-    devmapper_conf = {
-        "root_path": data_dir,
-        "pool_name": pool_name,
-        "base_image_size": "8192MB",
-        "discard_blocks": True,
-    }
+    # --------------------------
+    # Update containerd's config file to use the devmapper snapshotter
+    # --------------------------
+
+    # Note: we currently don't use the devmapper snapshot, so this just
+    # _configures_ it (but doesn't select it as snapshotter)
+    updated_toml_str = """
+    [plugins."io.containerd.snapshotter.v1.devmapper"]
+    root_path = "{root_path}"
+    pool_name = "{pool_name}"
+    base_image_size = "8192MB"
+    discard_blocks = true
+    """.format(
+        root_path=data_dir, pool_name=pool_name
+    )
+    update_toml(CONTAINERD_CONFIG_FILE, updated_toml_str)
 
-    conf_file = toml_load(CONTAINERD_CONFIG_FILE)
-    conf_file["plugins"]["io.containerd.snapshotter.v1.devmapper"] = devmapper_conf
 
-    tmp_conf = "/tmp/containerd_config.toml"
-    with open(tmp_conf, "w") as fh:
-        toml_dump(conf_file, fh)
+@task
+def set_log_level(ctx, log_level):
+    """
+    Set containerd's log level, must be one in: info, debug
+    """
+    allowed_log_levels = ["info", "debug"]
+    if log_level not in allowed_log_levels:
+        print(
+            "Unsupported log level '{}'. Must be one in: {}".format(
+                log_level, allowed_log_levels
+            )
+        )
+        return
 
-    # Finally, copy in place
-    run(
-        "sudo cp {} {}".format(tmp_conf, CONTAINERD_CONFIG_FILE), shell=True, check=True
+    updated_toml_str = """
+    [debug]
+    level = {log_level}
+    """.format(
+        log_level=log_level
     )
+    update_toml(CONTAINERD_CONFIG_FILE, updated_toml_str)
+
+    restart_containerd()
 
 
 @task
@@ -187,4 +221,4 @@ def cleanup():
     configure_devmapper_snapshotter()
 
     # Restart containerd service
-    run("sudo service containerd restart", shell=True, check=True)
+    restart_containerd()