Kata Container support #500

pohly · 2019-12-18T14:42:10Z

Contains support for creating image files, installing Kata Containers and running E2E tests with that. However, those tests need to be run manually on bare-metal hosts because triple-nested virtualization (like we would have to do in our Azure CI) is too slow.

not continuous segment is different extent

fix bug of FibmapExtents function

We cannot always vendor some upstream component. If we can't, then we can fork it by copying the files into "third-party", either with plain "cp" or "git subtree", and the add our own changes on top of it.

…d92e31ef976 Forked because upstream author seems inactive (last message is an apology for being inactive: frostschutz/go-fibmap#1 (comment)), so we need to maintain this ourselves. git-subtree-dir: third-party/go-fibmap git-subtree-mainline: c451fa6 git-subtree-split: b32c231

The resulting file can be used as backing store for a QEMU nvdimm device. This is based on the approach that is used for the Kata Container rootfs (https://github.com/kata-containers/osbuilder/blob/dbbf16082da3de37d89af0783e023269210b2c91/image-builder/image_builder.sh) and reuses some of the same code, but also differs from that in some regards: - The start of the partition is aligned a multiple of the 2MiB huge page size (kata-containers/runtime#2262 (comment)). - The size of the QEMU object is the same as the nominal size of the file. In Kata Containers the size is a fixed 128MiB (kata-containers/osbuilder#391 (comment)).

The test needs files prepared as part of a cluster creation, without running in that cluster itself.

Same change as inside the PMEM-CSI driver itself: we have to ensure that reflink is off because it is incompatible with "-o dax".

The implementation already worked like that, it just wasn't documented and thus it was unknown whether reusing the directory also for other local state (like the upcoming extra volume mounts) is okay.

pohly · 2020-05-10T16:58:21Z

@devimc: this PR now has testing against and instructions for Kata Containers 1.11.0-rc0. Can you perhaps check that the changes in docs look reasonable?

More feedback of course also welcome 😀

I'm pushing this while local tests are still running, but hopefully the new tests work now.

pohly · 2020-05-10T20:53:27Z

kata-deploy fails in our CI (https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/view/pmem-csi/job/pmem-csi/view/change-requests/job/PR-500/29/console):

kube-system pod/kata-deploy-prvzf 0/1 CrashLoopBackOff 4 4m31s 10.244.1.2

I'll check tomorrow why it fails there. I worked for me locally.

pohly · 2020-05-11T10:27:02Z

https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/view/pmem-csi/job/pmem-csi/view/change-requests/job/PR-500/lastCompletedBuild/testReport/clear-32690-.lvm-production/E2E/Kata_Containers__Testpattern__Dynamic_PV__ext4___dax_should_support_MAP_SYNC/

May 11 09:07:37.136: INFO: At 2020-05-11 09:02:31 +0000 UTC - event for dax-volume-test-kata: {kubelet pmem-csi-govm-worker1} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: failed to launch qemu: exit status 1, error messages from qemu log: Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory

Same error also for other nodes. It looks like we don't have nested virtualization enabled in the Azure VM. Let me see whether I can change that...

devimc

thanks @pohly - lgtm

devimc · 2020-05-11T12:29:45Z

docs/design.md

+
+This gets solved as follows:
+- PMEM-CSI creates a volume as usual, either in direct mode or LVM mode.
+- Inside that volume it sets up an ext4 filesystem.


jfyi - xfs is also supported

pohly · 2020-05-11T17:09:50Z

Progress (?): without the -vmx workaround in Jenkinsfile (i.e. plain -cpu host), /dev/kvm appears in the nodes, but it then fails with "Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing".

pohly · 2020-05-11T18:19:15Z

"Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing".

@devimc explained on IRC that triple nesting of VMs makes the inner QEMU so slow that the kubelet -> CRI communication times out. This means we cannot test with Kata Containers in the current Azure CI.

I'll make the tests optional. Until we have automatic testing on real hardware (BMaaS!), we'll simply have to test them manually from time to time on real hardware to detect regressions.

pohly · 2020-05-12T15:11:04Z

Tests are clean now after disabling the Kata Containers tests in our Azure CI.

@avalluri: okay to merge?

avalluri · 2020-05-13T12:55:30Z

deploy/common/pmem-storageclass-ext4-kata.yaml

+reclaimPolicy: Delete
+volumeBindingMode: Immediate


As per [app yaml(https://github.com//pull/500/files#diff-17203e3a5882efb0c1944558564a9c52R10-R11] , the application is expected to run only on nodes with katacontainers.io/kata-runtime: "true". So, if we use immediate binding mode might end up creating a node on the wrong node?

True. I should better switch this to "late-binding".

avalluri · 2020-05-13T13:12:53Z

docs/design.md

+  space available in the volume.
+- That partition is bound to a `/dev/loop` device and the formatted
+  with the requested filesystem type for the volume.
+- When an applications needs access to the volume, PMEM-CSI mounts


typo: an applications -> an application

avalluri · 2020-05-13T14:42:47Z

@pohly looks good to me. Just go ahead and merge after fixing the storage classes and the documentation nit.

A persistent or ephemeral volume can either be prepared for usage by DAX-enabled applications that don't run under Kata Containers (the default) or for DAX-enabled applications that run under Kata Containers. In both cases the volume can be used with and with Kata Containers, it's just that DAX only works either inside or ourside of Kata Containers. The Kata Container runtime must be able to access the image file while it is still mounted, therefore we cannot use something inside the target dir as mount point, because then the image file is shadowed by the mounted filesystem. We already have a local state dir for .json files. Putting something else inside it might confuse the state code, so instead we create a second directory with ".mount" appended to the directory name and use that for mount points. We also have to enable bi-directional mount propagation for it because otherwise the mounted fs with the image file is still only visible inside the container).

It can happen that Kubernetes comes up, but something else (like Kata Containers) doesn't. In that case "kubectl get all" may provide some hint.

Two minutes was enough locally, but not for the CI.

"make start" in an empty _work failed with: tar zxf _work/govm_0.9-alpha_Linux_amd64.tar.gz -C _work/bin/ tar: _work/bin: Cannot open: No such file or directory

Due to a race condition (?), kata-deploy fails in the CI because /etc/crio/crio.conf didn't exist at the time that it ran: $ kubectl logs -n kube-system kata-deploy-2dh2f copying kata artifacts onto host Add Kata Containers as a supported runtime for CRIO: cp: cannot stat '/etc/crio/crio.conf': No such file or directory Somehow it worked locally.

Even with the "-vmx" override in the Jenkinsfile removed, nested virtualization with three levels (Azure HyperV -> QEMU (govm) -> QEMU (Kata Containers)) was not working well enough for Kata Containers: because the inner VM runs very slowly, there are timeouts in the communication between kubelet and Kata Containers ("container create failed: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing"). That means that testing with Kata Containers has to be limited to bare metal. To achive that, it's turned off by default (and thus in the CI, which only runs on Azure) and has to be enabled with TEST_KATA_CONTAINERS_VERSION=1.11.0-rc0 or by invoking test/setup-kata-containers.sh manually.

The Kata Containers PR (intel#500) and tightening docsite validation (intel#640) were merged independently without retesting, which broke "devel" because some of the changes for Kata Containers caused warnings which are now errors.

frostschutz and others added 15 commits February 12, 2014 13:57

fibmap package

b48fa8d

fibmap as dedicated C file

a187192

figetbsz()

f41b08f

fibmap()

229abe3

fiemap()

480b3ee

Datahole()

26f71ce

delete C file, stick with native (unsafe) Go implementation for now

db66097

FibmapFile type, to keep track of extents and offsets in the future

d324314

FibmapExtents() emulate FIEMAP with FIBMAP

316ee4d

Fallocate()/PunchHole()

27496d1

MixedCaps for local constants

7b72b12

initialize repository

28cd1bf

MIT license

77fdb38

Update fibmap.go

dccaece

not continuous segment is different extent

Merge pull request intel#2 from chenzhongtao/master

b32c231

fix bug of FibmapExtents function

pohly force-pushed the kata-containers branch 5 times, most recently from cd1ff14 to 03d1d79 Compare December 19, 2019 14:10

pohly force-pushed the kata-containers branch 2 times, most recently from 3f20187 to 3b83db4 Compare January 14, 2020 10:51

This was referenced Jan 14, 2020

support PMEM inside Kata Containers #303

Closed

support PMEM inside Kata Containers when running under Kubernetes kata-containers/runtime#2262

Closed

pohly force-pushed the kata-containers branch from ba3a5f4 to d828424 Compare March 31, 2020 16:38

pohly added 5 commits May 7, 2020 18:33

third-party: support forking of upstream components

be6fe65

We cannot always vendor some upstream component. If we can't, then we can fork it by copying the files into "third-party", either with plain "cp" or "git subtree", and the add our own changes on top of it.

CI: integrate imagefile testing

063bb1e

The test needs files prepared as part of a cluster creation, without running in that cluster itself.

imagefile: disable reflink for XFS

f26ee12

Same change as inside the PMEM-CSI driver itself: we have to ensure that reflink is off because it is incompatible with "-o dax".

pmem state: document that only .json files matter

5fe60ce

The implementation already worked like that, it just wasn't documented and thus it was unknown whether reusing the directory also for other local state (like the upcoming extra volume mounts) is okay.

pohly force-pushed the kata-containers branch from d6c3b17 to 825d4aa Compare May 10, 2020 16:55

pohly force-pushed the kata-containers branch from 0b703d7 to 14b3e3f Compare May 11, 2020 07:12

devimc approved these changes May 11, 2020

View reviewed changes

pohly force-pushed the kata-containers branch 2 times, most recently from 2060858 to a6e0943 Compare May 12, 2020 07:49

pohly changed the title ~~WIP: Kata Container support~~ Kata Container support May 12, 2020

pohly assigned avalluri May 12, 2020

avalluri reviewed May 13, 2020

View reviewed changes

pohly added 6 commits May 13, 2020 18:01

test: dump Kubernetes objects after setup failure

d9f6adc

It can happen that Kubernetes comes up, but something else (like Kata Containers) doesn't. In that case "kubectl get all" may provide some hint.

test: increase timeout for Kata Containers

9e7ff93

Two minutes was enough locally, but not for the CI.

test: create target directories

2001229

"make start" in an empty _work failed with: tar zxf _work/govm_0.9-alpha_Linux_amd64.tar.gz -C _work/bin/ tar: _work/bin: Cannot open: No such file or directory

pohly force-pushed the kata-containers branch from a6e0943 to 4dcfcf5 Compare May 13, 2020 16:01

pohly merged commit 60e44b1 into intel:devel May 14, 2020

pohly mentioned this pull request May 15, 2020

doc: resolve docsite warnings for Kata Containers PR #641

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kata Container support #500

Kata Container support #500

pohly commented Dec 18, 2019 •

edited

Loading

pohly commented May 10, 2020

pohly commented May 10, 2020

pohly commented May 11, 2020

devimc left a comment

devimc May 11, 2020

pohly May 11, 2020

pohly commented May 11, 2020

pohly commented May 11, 2020

pohly commented May 12, 2020

avalluri May 13, 2020

pohly May 13, 2020

avalluri May 13, 2020

avalluri commented May 13, 2020

		reclaimPolicy: Delete
		volumeBindingMode: Immediate

Kata Container support #500

Kata Container support #500

Conversation

pohly commented Dec 18, 2019 • edited Loading

pohly commented May 10, 2020

pohly commented May 10, 2020

pohly commented May 11, 2020

devimc left a comment

Choose a reason for hiding this comment

devimc May 11, 2020

Choose a reason for hiding this comment

pohly May 11, 2020

Choose a reason for hiding this comment

pohly commented May 11, 2020

pohly commented May 11, 2020

pohly commented May 12, 2020

avalluri May 13, 2020

Choose a reason for hiding this comment

pohly May 13, 2020

Choose a reason for hiding this comment

avalluri May 13, 2020

Choose a reason for hiding this comment

avalluri commented May 13, 2020

pohly commented Dec 18, 2019 •

edited

Loading