Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

vsock: initiate a quick handshake to avoid VSOCK_DEFAULT_CONNECT_TIMEOUT #2656

Closed
wants to merge 1 commit into from

Conversation

Pennyzct
Copy link
Contributor

We will suffer default two seconds timeout, when kata-runtime initiates a connection to agent server,and right now vsock transport in guest part isn't ready.
Here, taking advices from Stefan Hajnoczi, we will do a quick handshake to ensure host initiating connection until vsock device is completely ready.
The quick handshake follows below steps:

  1. kata-runtime listens on a vsock port
  2. agent.vsock_host_port=PORT is added to the kernel command-line options
  3. kata-agent parses the port number and connects to the host
    This commit here covers the 1-2 parts in kata-runtime.

Fixes: #1917

Signed-off-by: Penny Zheng penny.zheng@arm.com

We will suffer default two seconds timeout, when kata-runtime initiates
a connection to agent server, but right now vsock transport in guest part
isn't ready.
Here, taking advices from Stefan Hajnoczi, we will do a quick handshake
to ensure host initiating connection until vsock device is completely ready.
The quick handshake follows above steps:
1. kata-runtime listens on a vsock port
2. agent.vsock_host_port=PORT is added to the kernel command-line options
3. kata-agent parses the port number and connects to the host
This commit covers the 1-2 parts in kata-runtime.

Fixes: kata-containers#1917

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
@Pennyzct Pennyzct added the wip Work in Progress (PR incomplete - needs more work or rework) label Apr 30, 2020
@Pennyzct
Copy link
Contributor Author

Hi guys
I haven't added unit tests here, so adding wip flag here.
I'm not sure whether giving the time out 2s here is correct. 😰

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Apr 30, 2020

Hi @stefanha
Plz feel free to review, hope I'm following your thinking here. ;)
The another part resides in kata-agent, plz see kata-containers/agent#776

if err != nil {
return err
}
defer syscall.Close(socket)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race condition here: we fetch the port number and then close the file descriptor. Now another host process can reuse the same port number before we get a chance to call vsock.Listen(). When this happens our vsock.Listen() call will fail.

Please keep the file descriptor open and use it instead of calling vsock.Listen() later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thanks. I'll save the server socket somewhere for later use.

// receive a connection from agent, that is,
// vsock device in guest is ready.
return nil
case <-time.After(vsockReadyTimeOut):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to introduce a new timeout? This timeout includes guest kernel boot and kata-agent startup, so 2 seconds might be a little short on a heavily loaded host.

I think a timeout isn't needed here since the container runtime fails container startup after some time anyway.

@Pennyzct Pennyzct closed this May 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
wip Work in Progress (PR incomplete - needs more work or rework)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sandbox creating is slower when use_vsock enabled
2 participants