Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pods running at different nodes should have different product_uuid #2318

Closed
qinqon opened this issue Jun 21, 2021 · 12 comments
Closed

pods running at different nodes should have different product_uuid #2318

qinqon opened this issue Jun 21, 2021 · 12 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@qinqon
Copy link
Contributor

qinqon commented Jun 21, 2021

What happened:
At KubeVirt we use kind to test SRIOV migrations and it depends on pods having different product_uuid, I saw that nodes were fixed and they have different product_uuid already, but is not the case for pods.

What you expected to happen:
Pods running at different nodes having different product_uuid.

How to reproduce it (as minimally and precisely as possible):

1- Create a cluster with

# three node (two workers) cluster config
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker

1 - Create a pair of pods at each worker

---
apiVersion: v1
kind: Pod
metadata:
  name: nginx-kind-worker
  labels:
    env: test
spec:
  nodeSelector:
    kubernetes.io/hostname: kind-worker
  containers:
  - name: nginx
    image: nginx
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx-kind-worker2
  labels:
    env: test
spec:
  nodeSelector:
    kubernetes.io/hostname: kind-worker2
  containers:
  - name: nginx
    image: nginx

3 - Check the product_uuid

$ kubectl exec nginx-kind-worker -- cat /sys/devices/virtual/dmi/id/product_uuid
ffd897cc-3219-11b2-a85c-ca68c7b7906b
$ kubectl exec nginx-kind-worker2 -- cat /sys/devices/virtual/dmi/id/product_uuid
ffd897cc-3219-11b2-a85c-ca68c7b7906b

Anything else we need to know?:
The node's issue was fixed with 78252a6, I suppose we cannot do the same with the pods since those containers are out of the kind's scope.

Environment:

  • kind version: (use kind version): kind v0.11.1 go1.16.3 linux/amd64
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:20:10Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-21T23:01:33Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.4.2-docker)

Server:
 Containers: 292
  Running: 4
  Paused: 0
  Stopped: 288
 Images: 3144
 Server Version: 20.10.0-rc1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: c623d1b36f09f8ef6536a057bd658b3aa8632828
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.8.18-100.fc31.x86_64
 Operating System: Fedora 31 (Workstation Edition)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.51GiB
 Name: localhost.localdomain
 ID: 43XE:HTHU:IZS5:3MDS:UVYX:5CBC:3T4T:RQXC:HCTT:KSKC:PISM:YMWS
 Docker Root Dir: /run/media/ellorent/docker
 Debug Mode: false
 Username: quiquell
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888
  registry:5000
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio weight support
WARNING: No blkio weight_device support
  • OS (e.g. from /etc/os-release):
NAME=Fedora
VERSION="31 (Workstation Edition)"
ID=fedora
VERSION_ID=31
VERSION_CODENAME=""
PLATFORM_ID="platform:f31"
PRETTY_NAME="Fedora 31 (Workstation Edition)"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:31"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f31/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=31
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=31
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation
@qinqon qinqon added the kind/bug Categorizes issue or PR as related to a bug. label Jun 21, 2021
@BenTheElder
Copy link
Member

The nodes have different UUIDs, but stuff like this within the pods is going to have this without some fun hacks ... because the VFS is in the kernel which they share.

@BenTheElder
Copy link
Member

I think you might be able to do this with a modified runc, injecting an extra ro bind mount here. We have to do some semi related stuff for rootless.

@qinqon
Copy link
Contributor Author

qinqon commented Jun 22, 2021

I think you might be able to do this with a modified runc, injecting an extra ro bind mount here. We have to do some semi related stuff for rootless.

Do you have a pointer to the runc integration ? I will take a look.

@BenTheElder BenTheElder added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jun 22, 2021
@BenTheElder BenTheElder changed the title pods runnig at different nodes have the same product_uuid pods running at different nodes should have different product_uuid Jun 22, 2021
@BenTheElder
Copy link
Member

IMHO though it would probably be better to make SRIOV able to take input from some other location though (e.g. the kubernetes node name).

I'm not sure we'd want to ship a hacked up container runtime just to manipulate this vfs, this is probably going to be fragile to maintain and kind is not really suitable for replacing virtualization / running workloads that deeply integrate against the kernel, it's mostly there so you can test against the kubernetes API ...

Container leakiness is to be expected, IMO deepening the papering over of that fact is a feature.

That said, pointers above should help you experiment with this, it should be possible to implement without patching kind even, just mount in your custom runc with an extraMount on the node and configure container to use it with a containerdConfigPatch in kind config at runtime.

@BenTheElder
Copy link
Member

btw kubevirt/kubevirtci#570 ?

@qinqon
Copy link
Contributor Author

qinqon commented Jun 22, 2021

btw kubevirt/kubevirtci#570 ?

Problem is PodPreset is deprecated after 1.19, we can always use an admission webhool to do the same, but before do that I want to check if there is something we can do at kind

@qinqon
Copy link
Contributor Author

qinqon commented Jun 22, 2021

IMHO though it would probably be better to make SRIOV able to take input from some other location though (e.g. the kubernetes node name).

I'm not sure we'd want to ship a hacked up container runtime just to manipulate this vfs, this is probably going to be fragile to maintain and kind is not really suitable for replacing virtualization / running workloads that deeply integrate against the kernel, it's mostly there so you can test against the kubernetes API ...

Container leakiness is to be expected, IMO deepening the papering over of that fact is a feature.

That said, pointers above should help you experiment with this, it should be possible to implement without patching kind even, just mount in your custom runc with an extraMount on the node and configure container to use it with a containerdConfigPatch in kind config at runtime.

I have see that there is a base_runtime_spec it contains the base of the images

https://github.com/containerd/containerd/blob/6883c845959ba36f72a3ca36a510d209b2248386/docs/cri/config.md

# base_runtime_spec is a file path to a JSON file with the OCI spec that will be used as the base spec that all
# container's are created from.
# Use containerd's `ctr oci spec > /etc/containerd/cri-base.json` to output initial spec file.
# Spec files are loaded at launch, so containerd daemon must be restarted on any changes to refresh default specs.
# Still running containers and restarted containers will still be using the original spec from which that container was created.
base_runtime_spec = ""

mount reference
https://github.com/opencontainers/runtime-spec/blob/master/config.md#mounts

Maybe kind can modify it so it includes a mount from product_uuid to random_uuid ?

@qinqon
Copy link
Contributor Author

qinqon commented Jun 22, 2021

Going with OCI spec hack I have try the following

#2321

{"destination": "/sys/class/dmi/id/product_uuid", "source": "/proc/sys/kernel/random/uuid"}

with the following result

Jun 22 09:32:18 kind-control-plane containerd[215]: time="2021-06-22T09:32:18.590666196Z" level=error msg="StartContainer for \"523a9f0949d7639a9c520bba12b417f2e81abf190ea1e22ebd3c03dae67ff8fd\" failed" error="failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting \"/proc/sys/kernel/random/uuid\" to rootfs at \"/sys/class/dmi/id/product_uuid\" caused: mkdir /run/containerd/io.containerd.runtime.v2.task/k8s.io/523a9f0949d7639a9c520bba12b417f2e81abf190ea1e22ebd3c03dae67ff8fd/rootfs/sys/devices/virtual/dmi/id/product_uuid: not a directory: unknown"

@qinqon
Copy link
Contributor Author

qinqon commented Jun 22, 2021

After using "bind" option it works as expected

{"destination": "/sys/class/dmi/id/product_uuid", "source": "/proc/sys/kernel/random/uuid", "options": ["bind"]}
[ellorent@localhost Downloads]$ kubectl exec nginx-kind-worker cat /sys/class/dmi/id/product_uuid
053ba73c-3a24-4cfe-b7ca-5a938a4600d7
[ellorent@localhost Downloads]$ kubectl exec nginx-kind-worker2 cat /sys/class/dmi/id/product_uuid
db9f435b-0316-4f66-92a0-8d3632d6f69c

@BenTheElder
Copy link
Member

This is really neat and seems very reasonable to maintain, sorry I've not been able to wrap this up yet.

ormergi added a commit to ormergi/kubevirtci that referenced this issue Aug 31, 2021
Unfortunately there is no API to enable and verify that the PodPreset
feature-gate is enabled.
As for today we check kube-apiserver process command on control-plane node
to validate that the PodPeset feature is enabled.

Since we upgraded kind node image to k8s-1.19, it seems that it takes more time
for changes on kube-apiserver.yaml to propagate (e.g: enabling PodPreset).

This workaround is temporary, once [1] will land at kubevirtci we can stop
using PodPreset.

[1] kubernetes-sigs/kind#2318

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirtci that referenced this issue Aug 31, 2021
Since we upgraded kind node image to k8s-1.19, it seems that it takes more time
for changes on kube-apiserver.yaml to propagate (e.g: enabling PodPreset).

Unfortunately there is no API to enable and verify that the PodPreset
feature-gate is enabled.
As for today we check kube-apiserver process command on control-plane node
to validate that the PodPeset feature is enabled.

This workaround is temporary, once [1] will land at kubevirtci we can stop
using PodPreset.

[1] kubernetes-sigs/kind#2318

Signed-off-by: Or Mergi <ormergi@redhat.com>
kubevirt-bot pushed a commit to kubevirt/kubevirtci that referenced this issue Aug 31, 2021
…669)

Since we upgraded kind node image to k8s-1.19, it seems that it takes more time
for changes on kube-apiserver.yaml to propagate (e.g: enabling PodPreset).

Unfortunately there is no API to enable and verify that the PodPreset
feature-gate is enabled.
As for today we check kube-apiserver process command on control-plane node
to validate that the PodPeset feature is enabled.

This workaround is temporary, once [1] will land at kubevirtci we can stop
using PodPreset.

[1] kubernetes-sigs/kind#2318

Signed-off-by: Or Mergi <ormergi@redhat.com>
@aojea aojea mentioned this issue Oct 12, 2021
@BenTheElder
Copy link
Member

closed by #2465 which includes #2321

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants