Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

vsock: Fix race condition happening in the virtio-vsock driver #933

Merged
merged 1 commit into from
Feb 20, 2020

Conversation

sboeuf
Copy link

@sboeuf sboeuf commented Feb 13, 2020

There was a race condition between bind() and listen() that was hit very
rarely when using Kata Containers and Cloud-Hypervisor. It's been
identified the problem is really coming from the virtio-vsock driver,
which is fixed by those new kernel patches uploaded for each version of
the kernels used by Kata Containers.

Fixes #932

Signed-off-by: Sebastien Boeuf sebastien.boeuf@intel.com

@jcvenegas
Copy link
Member

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@likebreath
Copy link
Contributor

LGTM. Thanks for submitting the patches. @sboeuf

Copy link
Contributor

@grahamwhaley grahamwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
good catch. nice debugging.

@@ -0,0 +1,49 @@
From 6f486b9250067b788df6dad1e001a8301305a46c Mon Sep 17 00:00:00 2001
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is virtio-fs-v0.3.x a new directory? does build-kernel look for patches in this directory automatically?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what @jcvenegas told me. Apparently, when Kata builds experimental kernel, it would look for a directory with the same name as the version.
Should it be virtio-fs-v0.3.x or virtio-fs-v0.3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will find for that dirctory based in the kernel version defined in versions.yaml
https://github.com/kata-containers/runtime/blob/master/versions.yaml#L163 seems that is not the more elegant tag.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oks, lgtm, thanks

@jcvenegas
Copy link
Member

Waiting for #935 to fix CI

@sboeuf
Copy link
Author

sboeuf commented Feb 14, 2020

@jcvenegas #935 has been merged!

@chavafg
Copy link
Contributor

chavafg commented Feb 14, 2020

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

index 2a8651aa90c8..ccc40dd1771b 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1051,6 +1051,7 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build for the kernel 4.19 is not working

+ make -s CONFIG_DEBUG_SECTION_MISMATCH=y -j4 ARCH=x86_64
[ 1416s] net/vmw_vsock/virtio_transport_common.c: In function 'virtio_transport_recv_pkt':
[ 1416s] net/vmw_vsock/virtio_transport_common.c:1054:40: error: 't' undeclared (first use in this function); did you mean 'tm'?
[ 1416s]  1054 |   (void)virtio_transport_reset_no_sock(t, pkt);
[ 1416s]       |                                        ^
[ 1416s]       |                                        tm
[ 1416s] net/vmw_vsock/virtio_transport_common.c:1054:40: note: each undeclared identifier is reported only once for each function it appears in
[ 1416s] net/vmw_vsock/virtio_transport_common.c:1054:9: error: too many arguments to function 'virtio_transport_reset_no_sock'
[ 1416s]  1054 |   (void)virtio_transport_reset_no_sock(t, pkt);
[ 1416s]       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 1416s] net/vmw_vsock/virtio_transport_common.c:663:12: note: declared here
[ 1416s]   663 | static int virtio_transport_reset_no_sock(struct virtio_vsock_pkt *pkt)
[ 1416s]       |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 1416s] make[2]: *** [scripts/Makefile.build:304: net/vmw_vsock/virtio_transport_common.o] Error 1
[ 1416s] make[1]: *** [scripts/Makefile.build:544: net/vmw_vsock] Error 2
[ 1416s] make: *** [Makefile:1052: net] Error 2
[ 1416s] make: *** Waiting for unfinished jobs....

https://build.opensuse.org/package/show/home:katacontainers:ci:x86_64:packaging-PR-933/linux-container

@sboeuf sboeuf force-pushed the fix_cloud_hypervisor branch from 17c5404 to 05014f1 Compare February 14, 2020 17:04
@jcvenegas
Copy link
Member

/test

@jcvenegas
Copy link
Member

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jcvenegas
Copy link
Member

False positive in packaging-ci https://build.opensuse.org/project/show/home:katacontainers:ci:x86_64:packaging-PR-933 all working

@jcvenegas
Copy link
Member

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

There was a race condition between bind() and listen() that was hit very
rarely when using Kata Containers and Cloud-Hypervisor. It's been
identified the problem is really coming from the virtio-vsock driver,
which is fixed by those new kernel patches uploaded for each version of
the kernels used by Kata Containers.

Fixes kata-containers#932

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
@jcvenegas jcvenegas force-pushed the fix_cloud_hypervisor branch from 05014f1 to a8ba86c Compare February 14, 2020 23:01
@jcvenegas
Copy link
Member

/test

@jcvenegas
Copy link
Member

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jcvenegas
Copy link
Member

/test

@jcvenegas jcvenegas merged commit 9116b56 into kata-containers:master Feb 20, 2020
jcvenegas added a commit to jcvenegas/kata-containers that referenced this pull request Feb 15, 2021
Backport from kata 2.x

kata-containers/packaging#933

Fixes: kata-containers#1415

Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vsock communication sporadically fails
6 participants