-
Notifications
You must be signed in to change notification settings - Fork 374
Run multiple kata-container instances on one host will failed on Arm64 #843
Comments
@Pennyzct and me has done some investigations about this issue. We have considered 3 methods:
Both of us thought, if the method#3 can work, that would be the best choice. So we tried from it. Here are what we have done:
Yes, after doing above work, we can run Kata-containers with NVDIMM on Arm64. But our concern
|
Firstly, glad to hear the nvdimm method works, and looking forward to it. Secondly, @egernst talked with me about the kernel version things, and we both want to upgrade the guest kernel to 4.19 at least. Personally, I think the different versions for different architectures should not be a blocker. What do others think about? |
nice work! |
@gnawux @grahamwhaley @egernst @jodh-intel |
/cc @chavafg. |
@Weichen81 @chavafg @Pennyzct - OK, what I've done is change the label on the arm slave node from |
thanks @grahamwhaley |
Hi~all @gnawux @grahamwhaley @jodh-intel @chavafg I have done all nvdimm-related tests on ARM CI, and for now the host kernel of ARM CI has been updated to 4.20-rc4 as requested.
so could anyone help me bring ARM CI online? After that, I could pull a bundle of requests to make aarch64 nvdimm-supported for kata-runtime. 😊 |
Sure @Pennyzct - I'll bring the ARM CI slave back online, and we'll see how it goes :-) |
@grahamwhaley thanks.😝 |
@Weichen81 Qemu has shared flag that you can pass to virtio-block to allow an image to be shared among several VM, did you take a look at that? |
@amshinde We had tried the shared flag before, but when we ran more than 3 kata-containers, the start up speed would be slower and slower. This is why we wanted to try nvdimm for Arm64 |
Hi~ @amshinde thanks for the proposal~ |
@Pennyzct Yes, I had forgot that, in my tests, if I did writes in one instance, other instances would get ext4-fs error. because we don't have software to notify other instances to update file system cache from disk. |
Original guest image was reprensented as block device in qemu-aarch64, and it will bring up write lock error when running multiple containers. Thanks to the new expanded IPA_SIZE feature in kernel 4.20 and Eric Auger's related patch set in qemu(which are still under upstream review), we could fully support nvdimm on arm64. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Since we overrided the func appendImage for aarch64, we should also provide related unit test. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
dax is not fully supported on arm64, so we disable dax for now. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Original guest image was reprensented as block device in qemu-aarch64, and it will bring up write lock error when running multiple containers. Thanks to the new expanded IPA_SIZE feature in kernel 4.20 and Eric Auger's related patch set in qemu(which are still under upstream review), we could fully support nvdimm on arm64. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Since we overrided the func appendImage for aarch64, we should also provide related unit test. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Since we overrided the func appendImage for aarch64, we should also provide related unit test. Depends-on: github.com/kata-containers/packaging#377 Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Since we overrided the func appendImage for aarch64, we should also provide related unit test. Fixes: kata-containers#843 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Improve the PR porting GitHub action by referencing a central script to handle the checks rather than hard-coding them in the workflow YAML. This ensures all PRs use the latest porting policy encoded in the script and makes maintenance easier. Related: kata-containers/kata-containers#634 Fixes: kata-containers#843. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Description of problem
As Arm64 is using the block device as the rootfs in guest. If we run two or more kata-container instances
on one host, we will get following error:
This is because all kata-container instances are sharing the same rootfs image. And all instances
want to open this file with
RW
permission. But this file has been locked by the first instance already.Because we're using the
RAW
format for rootfs image.We have tried to change the
RAW
format toQCOW
orQCOW2
format. Yes, with 'COPY-ON-WRITE'feature, we can run two or three instances at the same time. But, as the number of instances is inceasing,
the speed of creating instance becomes more and more slow. I think this may be caused by
QCOW/QCOW2
itself. BecauseQCOW/QCOW2
haven't been used massively on cloud. Most cloud platforms are using network block device for virtual machines.My question is that:
can we use NBD for kata-container instances to bypass this issue?
I know x86_64 is using persist memory, so it doesn't have similar issue @jodh-intel @gnawux @Pennyzct
(replace this text with the output of the
kata-collect-data.sh
script, afteryou have reviewed its content to ensure it does not contain any private
information).
The text was updated successfully, but these errors were encountered: