Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

take longer time got io response when exec a process in container on clh with dax enabled #2331

Closed
lifupan opened this issue Dec 7, 2019 · 6 comments
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.

Comments

@lifupan
Copy link
Member

lifupan commented Dec 7, 2019

Description of problem

It will take more than 10 seconds got io response on cloud hypervisor with dax enabled.

reproduce steps:

1: run a pod with crictl runp;
2: create a container in this pod;
3. start the container;
4: exec a process in the container with tty enabled;
It would take more than 10 seconds for the io response.

After disabled the virtiofs dax cache mode, it would worked well.

Expected result

(replace this text with an explanation of what you thought would happen)

Actual result

(replace this text with details of what actually happened)

@lifupan lifupan added bug Incorrect behaviour needs-review Needs to be assessed by the team. labels Dec 7, 2019
@lifupan
Copy link
Member Author

lifupan commented Dec 10, 2019

Hi @sboeuf ,have you ever met this issue?

@sboeuf
Copy link

sboeuf commented Dec 10, 2019

@lifupan no I haven't run into this issue. Did you compare the same use case with QEMU/virtio-fs?
Also, make sure the cache_size is large enough if you don't want to get bad performances out of virtio-fs. Usually 8G should be fine.

@lifupan
Copy link
Member Author

lifupan commented Dec 11, 2019

@sboeuf I had tried qemu, and it worked well with cache size of 1024M.
You can try with the latest kata static release https://github.com/kata-containers/runtime/releases/tag/1.10.0-rc0 with docker.

I had tired it but got error:

$ sudo docker exec -ti d4076cc5eb04 sh
OCI runtime exec failed: context deadline exceeded: unknown

@jcvenegas
Copy link
Member

I have seen similar behaivor, by time to time it takes a long time to reponse, but I was able to reproduce it, but also I see some times a container takes a long time to boot. I have ignore this for now as we are looking for another rece condition that seems to happen at stop ( but I what I found is that the agent stop responding) I am not sure if this is related with this, do you have a consistent reproducible way get more debug information @lifupan ?

@lifupan
Copy link
Member Author

lifupan commented Jan 7, 2020

@jcvenegas
I had even reproduced this bad performance on a bare cloud-hypervisor env without kata involved as below:
1)following the steps in https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/docs/fs.md to boot a clear linux in cloud-hypervisor share a container image rootfs dir using virtiofs from host to cloud-hypervisor guest os.

  1. login the guest os and then mount the shared container rootfs on to a directory with dax enabled;
  2. chroot into the mounted directory with running a bash command;
  3. you could find that it would take a longer time get the bash's response.

@sboeuf
Copy link

sboeuf commented Jan 8, 2020

@lifupan could you give this a new try based on the latest cloud-hypervisor commit on master. I updated the virtio-fs support through cloud-hypervisor/cloud-hypervisor#557 and I ran some fio testing showing it was performing very well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.
Projects
None yet
Development

No branches or pull requests

4 participants