Run multiple kata-container instances on one host will failed on Arm64 #843

Weichen81 · 2018-10-19T08:58:43Z

Description of problem

As Arm64 is using the block device as the rootfs in guest. If we run two or more kata-container instances
on one host, we will get following error:

root@entos-thunderx2-desktop:~# docker run -dt --runtime kata-runtime ubuntu
7baf2c0f26100b0e642ad4122ce9ea1cb556fb3ed6cfcb454cb0586cbbe6194d
root@entos-thunderx2-desktop:~# docker run -dt --runtime kata-runtime ubuntu
2065fcf560c136091b5e69b6a12c95de94308309bb7fd5309df852503752203a
docker: Error response from daemon: OCI runtime create failed: qemu-system-aarch64: -device virtio-blk,drive=image-9f100592ac95eec6,scsi=off,config-wce=off: Failed to get "write" lock
Is another process using the image?: unknown.

This is because all kata-container instances are sharing the same rootfs image. And all instances
want to open this file with RW permission. But this file has been locked by the first instance already.
Because we're using the RAW format for rootfs image.

We have tried to change the RAW format to QCOW or QCOW2 format. Yes, with 'COPY-ON-WRITE'
feature, we can run two or three instances at the same time. But, as the number of instances is inceasing,
the speed of creating instance becomes more and more slow. I think this may be caused by QCOW/QCOW2 itself. Because QCOW/QCOW2 haven't been used massively on cloud. Most cloud platforms are using network block device for virtual machines.

My question is that:
can we use NBD for kata-container instances to bypass this issue?

I know x86_64 is using persist memory, so it doesn't have similar issue @jodh-intel @gnawux @Pennyzct

(replace this text with the output of the kata-collect-data.sh script, after
you have reviewed its content to ensure it does not contain any private
information).

The text was updated successfully, but these errors were encountered:

Weichen81 · 2018-11-23T06:10:23Z

@Pennyzct and me has done some investigations about this issue. We have considered 3 methods:

Do some works to improve the QCOW/QCOW2. --> It needs lots time to fix and verify, and we have some legal issue to do it.
Use NBD for kata-containers. --> But it seems we have to modify OCI specification.
Try to use the same method of x86_64 --> Enable nvdimm support for Arm64

Both of us thought, if the method#3 can work, that would be the best choice. So we tried from it.
Luckily, after some efforts, we got nvdimm work on Arm64, we can run multiple kata-containers
instance at one host and it doesn't have the same issue as QCOW/QCOW2.

Here are what we have done:

We have to upgrade host and guest kernel to 4.20-rc3 and apply Suzuki's patch series:
https://patchwork.kernel.org/patch/10531723/
Apply Eric Auger's NVDIMM patches for QEMU
https://patchwork.kernel.org/cover/10647305/
Enable the CONFIG_ACPI_NFIT for Guest kernel.
Use the same QEMU NVDIMM parameters as x86_64

Yes, after doing above work, we can run Kata-containers with NVDIMM on Arm64. But our concern
is that:

We have to upgrade host and guest kernel to 4.20-rc3. Currently, we have specified the host kernel
versions, and the guest kernel is 4.14-x
We have to maintain these patches by ourselves until these patches have been merged to upstream,
Arm64 and x86_64 may use the different guest kernel version.

WDYT @jodh-intel @gnawux @grahamwhaley @egernst @bergwolf

gnawux · 2018-11-23T06:16:04Z

Firstly, glad to hear the nvdimm method works, and looking forward to it.

Secondly, @egernst talked with me about the kernel version things, and we both want to upgrade the guest kernel to 4.19 at least.

Personally, I think the different versions for different architectures should not be a blocker.

What do others think about?

grahamwhaley · 2018-11-23T09:28:23Z

nice work!
Agree, different versions on different arch's I think would not be unexpected. OK, so, generally we try to keep up with the latest 'longterm', so we get fixes and backports but don't churn too much on 'stable' or 'head' kernels. But, if an arch needs a feature that means it has to live on 'stable' for now, then so be it.

Weichen81 · 2018-11-28T07:17:50Z

@gnawux @grahamwhaley @egernst @jodh-intel
I have updated the Linux kernel of Arm CI server to v4.20-rc4, @Pennyzct will test the related patches (kata, QEMU and guest kernel) on CI server. If the test will be passed, we still start to send PRs. So could you please stop run PR CI on Arm server until @Pennyzct finish her test?

jodh-intel · 2018-11-28T09:23:14Z

/cc @chavafg.

grahamwhaley · 2018-11-28T09:47:21Z

@Weichen81 @chavafg @Pennyzct - OK, what I've done is change the label on the arm slave node from arm_node to arm_node_XXX. That should stop any of the jobs matching that as a build node, so should not schedule any builds on the slave.
Once you are done with your updates and verify we can re-enable, then we (probably myself or @chavafg ) will go remove the _XXX from the label, and the backlog of jobs should start flowing again.
(note to @chavafg - I've not marked the node as 'offline' - just changing the labels has worked well for me in the past when working on the metrics nodes - I guess if there were more folks tinkering then updating the offline status with a full comment of how and why would be more appropriate ;-) )

chavafg · 2018-11-28T13:53:58Z

thanks @grahamwhaley

Pennyzct · 2018-12-03T09:45:49Z

Hi~all @gnawux @grahamwhaley @jodh-intel @chavafg I have done all nvdimm-related tests on ARM CI, and for now the host kernel of ARM CI has been updated to 4.20-rc4 as requested.

root@testing-1:~# uname -a
Linux testing-1 4.20.0-rc4 #1 SMP Wed Nov 28 14:44:27 CST 2018 aarch64 aarch64 aarch64 GNU/Linux

so could anyone help me bring ARM CI online? After that, I could pull a bundle of requests to make aarch64 nvdimm-supported for kata-runtime. 😊

grahamwhaley · 2018-12-03T09:50:38Z

Sure @Pennyzct - I'll bring the ARM CI slave back online, and we'll see how it goes :-)
You should be able to monitor how the builds are going at http://jenkins.katacontainers.io/computer/arm01_slave/builds
There look to be 5 pending jobs or so, which should start being processed..

Pennyzct · 2018-12-03T09:55:44Z

@grahamwhaley thanks.😝

amshinde · 2018-12-07T19:32:47Z

@Weichen81 Qemu has shared flag that you can pass to virtio-block to allow an image to be shared among several VM, did you take a look at that?
I had added a similar flag for passing block devices on x86_64:
70edc56

Weichen81 · 2018-12-10T02:49:56Z

@amshinde We had tried the shared flag before, but when we ran more than 3 kata-containers, the start up speed would be slower and slower. This is why we wanted to try nvdimm for Arm64

Pennyzct · 2018-12-10T06:38:11Z

Hi~ @amshinde thanks for the proposal~
I was reading the related share-rw docs on qemu. It says:
If the guest can safely share the disk image with other writers the @code{-device ...,share-rw=on} parameter can be used. This is only safe if the guest is running software, such as a cluster file system, that coordinates disk accesses to avoid corruption
FWIT, this option doesn't provide extra write protection between multiple VMs. And kata also doesn't provide disk access coordination for rootfs, I think that it is risky to use this solution. ;)

Weichen81 · 2018-12-10T06:43:24Z

@Pennyzct Yes, I had forgot that, in my tests, if I did writes in one instance, other instances would get ext4-fs error. because we don't have software to notify other instances to update file system cache from disk.

Original guest image was reprensented as block device in qemu-aarch64, and it will bring up write lock error when running multiple containers. Thanks to the new expanded IPA_SIZE feature in kernel 4.20 and Eric Auger's related patch set in qemu(which are still under upstream review), we could fully support nvdimm on arm64. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>

Since we overrided the func appendImage for aarch64, we should also provide related unit test. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>

dax is not fully supported on arm64, so we disable dax for now. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>

Original guest image was reprensented as block device in qemu-aarch64, and it will bring up write lock error when running multiple containers. Thanks to the new expanded IPA_SIZE feature in kernel 4.20 and Eric Auger's related patch set in qemu(which are still under upstream review), we could fully support nvdimm on arm64. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>

Since we overrided the func appendImage for aarch64, we should also provide related unit test. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>

Since we overrided the func appendImage for aarch64, we should also provide related unit test. Depends-on: github.com/kata-containers/packaging#377 Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>

Since we overrided the func appendImage for aarch64, we should also provide related unit test. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>

Improve the PR porting GitHub action by referencing a central script to handle the checks rather than hard-coding them in the workflow YAML. This ensures all PRs use the latest porting policy encoded in the script and makes maintenance easier. Related: kata-containers/kata-containers#634 Fixes: kata-containers#843. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>

egernst added the bug Incorrect behaviour label Nov 1, 2018

This was referenced Dec 4, 2018

kernel: add dev-kernel build process and update kernel to 4.20-rc3+ on aarch64 kata-containers/packaging#268

Closed

kernel: update guest kernel to 4.19.8 on aarch64 kata-containers/packaging#269

Merged

Pennyzct mentioned this issue Dec 14, 2018

Container factory root folder is not writable on aarch64 kata-containers/agent#435

Closed

Pennyzct mentioned this issue Mar 5, 2019

docker: Run tests in parallel kata-containers/tests#1257

Merged

Pennyzct added a commit to Pennyzct/runtime that referenced this issue Mar 5, 2019

qemu-arm64: disable dax on arm64

5d1206d

dax is not fully supported on arm64, so we disable dax for now. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>

jodh-intel closed this as completed in 986e4dc Mar 12, 2019

Pennyzct mentioned this issue Dec 24, 2019

qemu: add disable_image_nvdimm option #2373

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run multiple kata-container instances on one host will failed on Arm64 #843

Run multiple kata-container instances on one host will failed on Arm64 #843

Weichen81 commented Oct 19, 2018

Weichen81 commented Nov 23, 2018

gnawux commented Nov 23, 2018 •

edited

Loading

grahamwhaley commented Nov 23, 2018

Weichen81 commented Nov 28, 2018

jodh-intel commented Nov 28, 2018

grahamwhaley commented Nov 28, 2018

chavafg commented Nov 28, 2018

Pennyzct commented Dec 3, 2018

grahamwhaley commented Dec 3, 2018

Pennyzct commented Dec 3, 2018

amshinde commented Dec 7, 2018

Weichen81 commented Dec 10, 2018

Pennyzct commented Dec 10, 2018

Weichen81 commented Dec 10, 2018

Run multiple kata-container instances on one host will failed on Arm64 #843

Run multiple kata-container instances on one host will failed on Arm64 #843

Comments

Weichen81 commented Oct 19, 2018

Description of problem

Weichen81 commented Nov 23, 2018

gnawux commented Nov 23, 2018 • edited Loading

grahamwhaley commented Nov 23, 2018

Weichen81 commented Nov 28, 2018

jodh-intel commented Nov 28, 2018

grahamwhaley commented Nov 28, 2018

chavafg commented Nov 28, 2018

Pennyzct commented Dec 3, 2018

grahamwhaley commented Dec 3, 2018

Pennyzct commented Dec 3, 2018

amshinde commented Dec 7, 2018

Weichen81 commented Dec 10, 2018

Pennyzct commented Dec 10, 2018

Weichen81 commented Dec 10, 2018

gnawux commented Nov 23, 2018 •

edited

Loading