Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No runtime for contrast-cc-k3s-qemu-tdx is configured #1264

Open
SelvamArul opened this issue Feb 28, 2025 · 11 comments · May be fixed by #1276
Open

No runtime for contrast-cc-k3s-qemu-tdx is configured #1264

SelvamArul opened this issue Feb 28, 2025 · 11 comments · May be fixed by #1276
Milestone

Comments

@SelvamArul
Copy link

I am installing contrast on a bare metal machine with Intel TDX. After deploying the contrast runtime using kubectl apply -f https://github.com/edgelesssys/contrast/releases/download/v1.5.1/runtime-k3s-qemu-tdx.yml and the contrast coordinator https://github.com/edgelesssys/contrast/releases/download/v1.5.1/coordinator-k3s-qemu-tdx.yml, coordinator pod get stuck in the ContainerCreating status:

NAMESPACE         NAME                                                    READY   STATUS              RESTARTS        AGE
default           coordinator-0                                           0/1     ContainerCreating   0               5m15s 

Following lines in kubectl describe pods coordinator-0 seems suspicious:
Warning FailedCreatePodSandBox 8s kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = unable to get OCI runtime for sandbox "0c843a14d1958808e834c664fa4bc5e050570a618bc2760f83360c06d0ee602e": no runtime for "contrast-cc-k3s-qemu-tdx-69c6b92c" is configured

Here is the full output:

contrast-coordinator-describe.log

TDX seems to be installed correctly:

$ dmesg | grep -i  TDX
[    0.519540] virt/tdx: BIOS enabled: private KeyID range [32, 64)
[    0.519544] virt/tdx: Disable ACPI S3. Turn off TDX in the BIOS to use ACPI S3.
[   10.018354] virt/tdx: TDX module: attributes 0x0, vendor_id 0x8086, major_version 1, minor_version 5, build_date 20240129, build_num 698
[   10.018358] virt/tdx: CMR: [0x100000, 0x77800000)
[   10.018361] virt/tdx: CMR: [0x100000000, 0x1ffe000000)
[   10.018362] virt/tdx: CMR: [0x4000000000, 0x4080000000)
[   10.342428] virt/tdx: 525324 KB allocated for PAMT
[   10.342431] virt/tdx: module initialized

How can I debug this issue?

@burgerdev
Copy link
Contributor

Hi @SelvamArul,

Thanks for writing this bug report. What you're describing sounds like a problem in the nodeinstaller - if it encounters an issue, it can't update the containerd config and containerd won't see the new runtimeclass.

Can you check whether the nodeinstaller is healthy?

kubectl get pods -n kube-system -l app.kubernetes.io/name=contrast-cc-k3s-qemu-tdx-69c6b92c-nodeinstaller

Then see if the containerd config contains the runtimeclass:

kubectl debug node/cocoubuntu -it --image busybox -- cat /host/var/lib/rancher/k3s/agent/etc/containerd/config.toml |
   grep -C5 contrast-cc-k3s-qemu-tdx-69c6b92c

@SelvamArul
Copy link
Author

@burgerdev Thanks a lot for the reponse.
Yes, the nodeinstaller is healthy:

$ kubectl get pods -n kube-system -l app.kubernetes.io/name=contrast-cc-k3s-qemu-tdx-69c6b92c-nodeinstaller
NAME                                                    READY   STATUS    RESTARTS   AGE
contrast-cc-k3s-qemu-tdx-69c6b92c-nodeinstaller-q9dpd   2/2     Running   0          20m

Also, containerd config contains the runtime class:

$ kubectl debug node/cocoubuntu -it --image busybox -- cat /host/var/lib/rancher/k3s/agent/etc/containerd/config.toml |
   grep -C5 contrast-cc-k3s-qemu-tdx-69c6b92c
disable_snapshot_annotations = false
discard_unpacked_layers = false

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes]

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.contrast-cc-k3s-qemu-tdx-69c6b92c]
runtime_type = 'io.containerd.contrast-cc.v2'
runtime_path = '/opt/edgeless/contrast-cc-k3s-qemu-tdx-69c6b92c/bin/containerd-shim-contrast-cc-v2'
pod_annotations = ['io.katacontainers.*']
privileged_without_host_devices = true
snapshotter = 'nydus-contrast-cc-k3s-qemu-tdx-69c6b92c'

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.contrast-cc-k3s-qemu-tdx-69c6b92c.options]
ConfigPath = '/opt/edgeless/contrast-cc-k3s-qemu-tdx-69c6b92c/etc/configuration-qemu-tdx.toml'

[plugins.'io.containerd.internal.v1.opt']
path = '/var/lib/rancher/k3s/agent/containerd'

[proxy_plugins]

[proxy_plugins.nydus-contrast-cc-k3s-qemu-tdx-69c6b92c]
type = 'snapshot'
address = '/run/containerd/containerd-nydus-grpc-contrast-cc-k3s-qemu-tdx-69c6b92c.sock'

@burgerdev
Copy link
Contributor

Ok, that's odd. You could check whether that config is loaded correctly with

k3s crictl info | jq '.config.containerd.runtimes["contrast-cc-k3s-qemu-snp-69c6b92c"]'

For completeness, which version of k3s are you using? Is k3s a systemd unit on your machine?

@SelvamArul
Copy link
Author

Something strange with my k3s set up. I installed k3s using curl -sfL https://get.k3s.io | sh -.
k3s version v1.31.6+k3s1 (6ab750f9) Yes, k3s is a systemd unit on my machine:

$ systemctl status k3s.service
● k3s.service - Lightweight Kubernetes
     Loaded: loaded (/etc/systemd/system/k3s.service; enabled; preset: enabled)
     Active: active (running) since Thu 2025-03-06 11:03:24 UTC; 2min 31s ago
       Docs: https://k3s.io
    Process: 80343 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null (code=exited, status=0/SUCCESS)
    Process: 80345 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 80347 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 80350 (k3s-server)
      Tasks: 518
     Memory: 5.7G (peak: 5.8G)
        CPU: 1min 12.965s
     CGroup: /system.slice/k3s.service
             ├─ 3932 /var/lib/rancher/k3s/data/4532effb54c1f987f51a6b860588c2ae555bf73c3d00e4e28952188cd293484f/bin/containerd-shim-runc-v2 -namespace k8s.io -id 493ffdc0b62af5a8362ff75034351cfa14d1ac5a31c3ea5047d>
        .....
        .....
        .....
"CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = unable to get OCI runtime for sandbox \"72a>
Mar 06 11:05:43 cocoubuntu k3s[80350]: E0306 11:05:43.812647   80350 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"coordinator-0_default(781203cd-47d9-4404-adad-1d2ad>

I am non-root user (although part of sudoers).

$ k3s crictl info
WARN[0000] Failed to stat /var/lib/rancher/k3s/agent/etc/crictl.yaml: permission denied
FATA[0000] load config file: stat /var/lib/rancher/k3s/data/4532effb54c1f987f51a6b860588c2ae555bf73c3d00e4e28952188cd293484f/bin/crictl.yaml: no such file or directory

This lack of permission is confusing me.

@burgerdev
Copy link
Contributor

I think that's expected - the directory containing the crictl yaml is not world readable:

$ ls -ld /var/lib/rancher/k3s/agent
drwx------ 5 root root 4096 Feb  6 14:11 /var/lib/rancher/k3s/agent

If the k3s CLI flavour does not work, you can also use the normal crictl and point it to the k3s-provided containerd socket:

crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock info

@SelvamArul
Copy link
Author

Apparently, my user does not have permission to access /run/k3s/containerd/containerd.sock

$ crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock info
FATA[0000] validate service connection: validate CRI v1 runtime API for endpoint "unix:///run/k3s/containerd/containerd.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /run/k3s/containerd/containerd.sock: connect: permission denied"
$ sudo ls -l /run/k3s/containerd/
total 0
srw-rw---- 1 root root  0 Mar  6 11:03 containerd.sock
srw-rw---- 1 root root  0 Mar  6 11:03 containerd.sock.ttrpc
drwxr-xr-x 4 root root 80 Mar  6 10:07 io.containerd.grpc.v1.cri
drwx--x--x 3 root root 60 Mar  6 10:06 io.containerd.runtime.v2.task
drwx--x--x 2 root root 40 Mar  6 10:06 io.containerd.sandbox.controller.v1.shim

@burgerdev
Copy link
Contributor

So, since you have sudo powers, could you just sudo crictl?

@SelvamArul
Copy link
Author

Sure, he is the output for crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock info

crictl-output.txt

@burgerdev
Copy link
Contributor

The runtime class is not configured for this containerd, it seems. I don't quite understand how this is possible, given that it's clearly in the containerd config.toml. Any hint in the k3s logs? There should be log lines like

Using containerd template at /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml

Now that I wrote that, could you please verify that the runtimeclass is also present in /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl?

@SelvamArul
Copy link
Author

In the k3s logs generated after the last reboot journalctl -b -u k3s > k3s-journalctl.log, Using containerd template at /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl occurs at line 1514. After that, until the next relavant error message "Unknown runtime handler" runtimeHandlerName="contrast-cc-k3s-qemu-tdx-69c6b92c" at line 2161, I couldn't find anything suspicious.
Here is the log file:

k3s-journalctl.log

And, contrast-cc-k3s-qemu-tdxis present in the /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl:
cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl | grep -C5 contrast-cc-k3s-qemu-tdx-69c6b92c:

[plugins.'io.containerd.grpc.v1.cri'.containerd]
disable_snapshot_annotations = false
discard_unpacked_layers = false

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes]
[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.contrast-cc-k3s-qemu-tdx-69c6b92c]
runtime_type = 'io.containerd.contrast-cc.v2'
runtime_path = '/opt/edgeless/contrast-cc-k3s-qemu-tdx-69c6b92c/bin/containerd-shim-contrast-cc-v2'
pod_annotations = ['io.katacontainers.*']
privileged_without_host_devices = true
snapshotter = 'nydus-contrast-cc-k3s-qemu-tdx-69c6b92c'

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.contrast-cc-k3s-qemu-tdx-69c6b92c.options]
ConfigPath = '/opt/edgeless/contrast-cc-k3s-qemu-tdx-69c6b92c/etc/configuration-qemu-tdx.toml'

[plugins.'io.containerd.internal.v1.opt']
path = '/var/lib/rancher/k3s/agent/containerd'

[proxy_plugins]
[proxy_plugins.nydus-contrast-cc-k3s-qemu-tdx-69c6b92c]
type = 'snapshot'
address = '/run/containerd/containerd-nydus-grpc-contrast-cc-k3s-qemu-tdx-69c6b92c.sock'
root@cocoubuntu:~# cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl | grep  -C5 contrast-cc-k3s-qemu-tdx-69c6b92c
[plugins.'io.containerd.grpc.v1.cri'.containerd]
disable_snapshot_annotations = false
discard_unpacked_layers = false

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes]
[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.contrast-cc-k3s-qemu-tdx-69c6b92c]
runtime_type = 'io.containerd.contrast-cc.v2'
runtime_path = '/opt/edgeless/contrast-cc-k3s-qemu-tdx-69c6b92c/bin/containerd-shim-contrast-cc-v2'
pod_annotations = ['io.katacontainers.*']
privileged_without_host_devices = true
snapshotter = 'nydus-contrast-cc-k3s-qemu-tdx-69c6b92c'

[plugins.'io.containerd.grpc.v1.cri'.containerd.runtimes.contrast-cc-k3s-qemu-tdx-69c6b92c.options]
ConfigPath = '/opt/edgeless/contrast-cc-k3s-qemu-tdx-69c6b92c/etc/configuration-qemu-tdx.toml'

[plugins.'io.containerd.internal.v1.opt']
path = '/var/lib/rancher/k3s/agent/containerd'

[proxy_plugins]
[proxy_plugins.nydus-contrast-cc-k3s-qemu-tdx-69c6b92c]
type = 'snapshot'
address = '/run/containerd/containerd-nydus-grpc-contrast-cc-k3s-qemu-tdx-69c6b92c.sock'

@burgerdev
Copy link
Contributor

Ok, I may have found the issue. From last week's k3s release notes:

Containerd 2.0 uses a new config file schema. If you are using a custom containerd config template, you should migrate your template to config-v3.toml.tmpl to switch to the new version.

Let me try to verify this and fix our nodeinstaller. In the meantime, you could install an older k3s version as workaround:

export INSTALL_K3S_VERSION=v1.30.5+k3s1
# export INSTALL_K3S_VERSION=v1.31.5+k3s1 should also work
curl -sfL https://get.k3s.io |  sh -

@burgerdev burgerdev added this to the v1.6.0 milestone Mar 7, 2025
@burgerdev burgerdev linked a pull request Mar 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants