Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

Commit

Permalink
support Kata Containers
Browse files Browse the repository at this point in the history
A persistent or ephemeral volume can either be prepared for usage by
DAX-enabled applications that don't run under Kata Containers (the
default) or for DAX-enabled applications that run under Kata
Containers.

In both cases the volume can be used with and with Kata Containers,
it's just that DAX only works either inside or ourside of Kata
Containers.

The Kata Container runtime must be able to access the image file while
it is still mounted, therefore we cannot use something inside the
target dir as mount point, because then the image file is shadowed by
the mounted filesystem.

We already have a local state dir for .json files. Putting something
else inside it might confuse the state code, so instead we create a
second directory with ".mount" appended to the directory name and use
that for mount points.

We also have to enable bi-directional mount propagation for it because
otherwise the mounted fs with the image file is still only visible
inside the container).
  • Loading branch information
pohly committed May 10, 2020
1 parent 5fe60ce commit 825d4aa
Show file tree
Hide file tree
Showing 40 changed files with 1,152 additions and 67 deletions.
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ KUSTOMIZE_OUTPUT += deploy/common/pmem-storageclass-cache.yaml
KUSTOMIZATION_deploy/common/pmem-storageclass-cache.yaml = deploy/kustomize/storageclass-cache
KUSTOMIZE_OUTPUT += deploy/common/pmem-storageclass-late-binding.yaml
KUSTOMIZATION_deploy/common/pmem-storageclass-late-binding.yaml = deploy/kustomize/storageclass-late-binding
kustomize: $(KUSTOMIZE_OUTPUT)
kustomize: clean_kustomize_output $(KUSTOMIZE_OUTPUT)
$(KUSTOMIZE_OUTPUT): _work/kustomize $(KUSTOMIZE_INPUT)
$< build --load_restrictor none $(KUSTOMIZATION_$@) >$@
if echo "$@" | grep -q '/pmem-csi-'; then \
Expand All @@ -185,6 +185,8 @@ $(KUSTOMIZE_OUTPUT): _work/kustomize $(KUSTOMIZE_INPUT)
cp $@ $$dir/pmem-csi.yaml && \
echo 'resources: [ pmem-csi.yaml ]' > $$dir/kustomization.yaml; \
fi
clean_kustomize_output:
rm -f $(KUSTOMIZE_OUTPUT)

# Always re-generate the output files because "git rebase" might have
# left us with an inconsistent state.
Expand Down
26 changes: 26 additions & 0 deletions deploy/common/pmem-kata-app-ephemeral.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
kind: Pod
apiVersion: v1
metadata:
name: my-csi-app-inline-volume
labels:
io.katacontainers.config.hypervisor.memory_offset: "2147483648" # 2Gi, must be at least as large as the PMEM volume
spec:
# see https://github.com/kata-containers/packaging/tree/1.11.0-rc0/kata-deploy#run-a-sample-workload
runtimeClassName: kata-qemu
nodeSelector:
katacontainers.io/kata-runtime: "true"
containers:
- name: my-frontend
image: intel/pmem-csi-driver-test:canary
command: [ "sleep", "100000" ]
volumeMounts:
- mountPath: "/data"
name: my-csi-volume
volumes:
- name: my-csi-volume
csi:
driver: pmem-csi.intel.com
fsType: "xfs"
volumeAttributes:
size: "2Gi"
kataContainers: "true"
22 changes: 22 additions & 0 deletions deploy/common/pmem-kata-app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
kind: Pod
apiVersion: v1
metadata:
name: my-csi-kata-app
labels:
io.katacontainers.config.hypervisor.memory_offset: "2147483648" # 2Gi, must be at least as large as the PMEM volume
spec:
# see https://github.com/kata-containers/packaging/tree/1.11.0-rc0/kata-deploy#run-a-sample-workload
runtimeClassName: kata-qemu
nodeSelector:
katacontainers.io/kata-runtime: "true"
containers:
- name: my-frontend
image: intel/pmem-csi-driver-test:canary
command: [ "sleep", "100000" ]
volumeMounts:
- mountPath: "/data"
name: my-csi-volume
volumes:
- name: my-csi-volume
persistentVolumeClaim:
claimName: pmem-csi-pvc-kata # see pmem-kata-pvc.yaml
11 changes: 11 additions & 0 deletions deploy/common/pmem-kata-pvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pmem-csi-pvc-kata
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
storageClassName: pmem-csi-sc-ext4-kata # defined in pmem-storageclass-ext4-kata.yaml
11 changes: 11 additions & 0 deletions deploy/common/pmem-storageclass-ext4-kata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pmem-csi-sc-ext4-kata
parameters:
csi.storage.k8s.io/fstype: ext4
eraseafter: "true"
kataContainers: "true"
provisioner: pmem-csi.intel.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
11 changes: 11 additions & 0 deletions deploy/common/pmem-storageclass-xfs-kata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: pmem-csi-sc-xfs-kata
parameters:
csi.storage.k8s.io/fstype: xfs
eraseafter: "true"
kataContainers: "true"
provisioner: pmem-csi.intel.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/direct/pmem-csi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /sys
name: sys-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/direct/testing/pmem-csi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /sys
name: sys-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/lvm/pmem-csi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- args:
- -v=3
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/lvm/testing/pmem-csi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /var/lib/pmem-csi-coverage
name: coverage-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/pmem-csi-direct-testing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /sys
name: sys-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/pmem-csi-direct.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /sys
name: sys-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/pmem-csi-lvm-testing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /var/lib/pmem-csi-coverage
name: coverage-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/pmem-csi-lvm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- args:
- -v=3
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/pmem-storageclass-ext4-kata.yaml
1 change: 1 addition & 0 deletions deploy/kubernetes-1.15/pmem-storageclass-xfs-kata.yaml
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/direct/pmem-csi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /sys
name: sys-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/direct/testing/pmem-csi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /sys
name: sys-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/lvm/pmem-csi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- args:
- -v=3
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/lvm/testing/pmem-csi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /var/lib/pmem-csi-coverage
name: coverage-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/pmem-csi-direct-testing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /sys
name: sys-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/pmem-csi-direct.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /sys
name: sys-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/pmem-csi-lvm-testing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- mountPath: /var/lib/pmem-csi-coverage
name: coverage-dir
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/pmem-csi-lvm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ spec:
- mountPath: /dev
name: dev-dir
- mountPath: /var/lib/pmem-csi.intel.com
mountPropagation: Bidirectional
name: pmem-state-dir
- args:
- -v=3
Expand Down
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/pmem-storageclass-ext4-kata.yaml
1 change: 1 addition & 0 deletions deploy/kubernetes-1.16/pmem-storageclass-xfs-kata.yaml
5 changes: 5 additions & 0 deletions deploy/kustomize/driver/pmem-csi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,11 @@ spec:
mountPath: /dev
- name: pmem-state-dir
mountPath: /var/lib/pmem-csi.intel.com
# Needed for Kata Containers: we mount the PMEM volume inside our
# state dir and want that to be visible also on the host, because
# the host will need access to the image file that we create inside
# that mounted fs.
mountPropagation: Bidirectional
- name: driver-registrar
imagePullPolicy: Always
image: quay.io/k8scsi/csi-node-driver-registrar:v1.X.Y
Expand Down
44 changes: 43 additions & 1 deletion docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
- [Architecture and Operation](#architecture-and-operation)
- [LVM device mode](#lvm-device-mode)
- [Direct device mode](#direct-device-mode)
- [Kata Containers support](#kata-containers-support)
- [Driver modes](#driver-modes)
- [Driver Components](#driver-components)
- [Communication between components](#communication-between-components)
Expand Down Expand Up @@ -126,6 +127,47 @@ In direct device mode, the driver does not attempt to limit space
use. It also does not mark "own" namespaces. The _Name_ field of a
namespace gets value of the VolumeID.

## Kata Container support

[Kata Containers](https://katacontainers.io) runs applications inside a
virtual machine. This poses a problem for App Direct mode, because
access to the filesystem prepared by PMEM-CSI is provided inside the
virtual machine by the 9p or virtio-fs filesystems. Both do not
support App Direct mode:
- 9p does not support `mmap` at all.
- virtio-fs only supports it when not using `MAP_SYNC`, i.e. without dax
semantic.

This gets solved as follows:
- PMEM-CSI creates a volume as usual, either in direct mode or LVM mode.
- Inside that volume it sets up an ext4 filesystem.
- Inside that filesystem it creates a `pmem-csi-vm.img` file that contains
partition tables, dax metadata and a partition that takes up most of the
space available in the volume.
- That partition is bound to a `/dev/loop` device and the formatted
with the requested filesystem type for the volume.
- When an applications needs access to the volume, PMEM-CSI mounts
that `/dev/loop` device.
- An application not running under Kata Containers then uses
that filesystem normally *but* due to limitations in the Linux
kernel, mounting might have to be done without `-odax` and thus
App Direct access does not work.
- When the Kata Container runtime is asked to provide access to that
filesystem, it will instead pass the underlying `pmem-csi-vm.img`
file into QEMU as a [nvdimm
device](https://github.com/qemu/qemu/blob/master/docs/nvdimm.txt)
and inside the VM mount the `/dev/pmem0p1` partition that the
Linux kernel sets up based on the dax meta data that was placed in the
file by PMEM-CSI. Inside the VM, the App Direct semantic is fully
supported.

Such volumes can be used with full dax semantic *only* inside Kata
Containers. They are still usable with other runtimes, just not
with dax semantic. Because of that and the additional space overhead,
Kata Container support has to be enabled explicitly via a [storage
class parameter and Kata Containers must be set up
appropriately](install.md#kata-containers-support)

## Driver modes

The PMEM-CSI driver supports running in different modes, which can be
Expand Down Expand Up @@ -388,4 +430,4 @@ that don't use PMEM-CSI at all.

Users must take care to create PVCs first, then the pods if they want
to use the webhook. In practice, that is often already done because it
is more natural, so it is not a big limitation.
is more natural, so it is not a big limitation.
Loading

0 comments on commit 825d4aa

Please sign in to comment.