Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vsocks: firecracker terminates the host connection before starting the init process #1253

Closed
devimc opened this issue Sep 11, 2019 · 9 comments
Assignees

Comments

@devimc
Copy link

devimc commented Sep 11, 2019

Description of problem

firecracker terminates the host connection before starting the init process (systemd or kata-agent) in the guest OS. This issue happens when the CONNECT <port> command is sent immediately after InstanceStart. To mitigate this issue in kata containers the runtime (host) tries to connect with the agent (guest) several times until the connection succeed, we see this approach more like a workaround than a real solution, since we consider this is prone to failures in slow systems where the number of tries can be bigger.

Proposal 1

Firecracker should never terminate the host connection (like virtio-vsocks implementation.. ?)

Proposal 2

Add a timeout parameter in the json request that creates the vsock

{"vsock_id": "root","guest_cid": 3,"uds_path": "/vsock","timeout": 120}

where timeout is in seconds


cc @sboeuf @sandreim @dhrgit @stefanha

@dhrgit
Copy link
Contributor

dhrgit commented Sep 12, 2019

If I understand this correctly, you are trying to connect to the guest before it actually had a chance to boot-up and start an AF_VSOCK listener. If that's so, then IMO, terminating the host connection is the correct behavior, since the host is trying to connect to something that isn't there.

The VMM is not privy to user space actions, such as starting socket listeners, so IMO, it shouldn't have the responsibility of ensuring proper timing concerning said actions.

@dhrgit dhrgit self-assigned this Sep 12, 2019
@devimc
Copy link
Author

devimc commented Sep 12, 2019

@dhrgit thanks for answering.

terminating the host connection is the correct behavior, since the host is trying to connect to something that isn't there.

ok, that make sense, can the timeout be configurable?

@dhrgit
Copy link
Contributor

dhrgit commented Sep 12, 2019

I'm not sure I understand what timeout that would be. In the use-case I mentioned, there is no timeout involved - the host connection is immediately refused by the guest (since there are no listeners present).

Also, I don't really understand how this would work with vhost either. When using vhost, were you issuing an AF_VSOCK connect call right after InstanceStart? I don't see how that could've worked reliably. Unless some AF_VSOCK socket is bound and listened to on the guest, any incoming connection would get immediately refused. Is there something I'm missing?

@stefanha
Copy link

stefanha commented Sep 12, 2019 via email

@sboeuf
Copy link
Contributor

sboeuf commented Sep 12, 2019

@dhrgit @stefanha
Your explanation seems logical, but I am still not clear about why this fix was not needed for Kata till now.
From what I understand, when the vhost-vsock backend is used, this means the host kernel receives the connection from the application. And I think in this case, no matter if someone is listening inside the guest, the connection succeed. My guess is, it does succeed since the VMM initialized the vhost-vsock backend with a specific CID:PORT. When the kernel receives the connection, it simply accept it since it knows the incoming CID:PORT. Later when the guest application start listening onto this AF_VSOCK socket, the communication can really happen.

If, and only if my theory is correct, this means the hybrid version from Firecracker behaves differently. And if that's the case, we need to discuss if we want to handle buffering from a VMM perspective, until the guest application starts listening.

@devimc
Copy link
Author

devimc commented Sep 12, 2019

I think I found the main issue. I did the next experiment in both QEMU and Firecracker:

  1. Add boot_delay=6000 in the guest kernel command line
  2. Connect to the unix/vsock address from the host
    • Use net.Dial to connect to the unix address (Firecracker)
    • Use vsock.Dial to connect to the vsock address (QEMU)

In the QEMU case, vsock.Dial returns the same error (dial vsock vm(3094744204):1024: connection reset by peer) until the kernel continues with its execution and creates the vsock connection in the guest.
But in the Firecracker case, net.Dial returns no error even if the kernel has not started or it has failed, so when the runtime tries to send a request, it fails because of it expects the connection is ready once net.Dial returns (like the QEMU/vsocks case). so, I guess the runtime has to sleep and retry until the agent reads the message. 😟

@devimc
Copy link
Author

devimc commented Sep 12, 2019

@dhrgit after doing some testing, I found something interesting: If the unix socket is read after having written CONNECT <port>, EOF error is returned if the vsock is not ready, but when the vsock is ready in the guest, and the socket is read again, no error is returned and the buffer used to read the message contains 0x04 (EOT (end of transmission)). is this an expected behaviour?, If so, I guess we can use it to know when the agent is ready.

What is the answer of Firecracker when the CONNECT <port> command is sent ?

@dhrgit
Copy link
Contributor

dhrgit commented Sep 13, 2019

@devimc When you connect to the Firecracker unix socket and write a CONNECT <port> command, a vsock connection request packet is queued by Firecracker. That request packet will reach the guest only after the guest vsock driver has had a chance to initialize and make some RX buffers available (the connection request packet is sent to the guest via an RX buffer).

So, if you send the CONNECT <port> command before the guest is fully booted up, chances are that the connection request packet sent by Firecracker will reach the guest socket layer sometime between vsock driver initialization and the guest userspace app binding/listening to the AF_VSOCK <port>. This means your connection will be refused, causing Firecracker to sever the host unix connection, leading to an EOF (0-length) read on your connected unix socket.

I don't fully understand the flow you are describing that leads to EOT. I'm guessing this behavior is implemented by some higher level software (above the socket layer).

If you want to connect to an uninitialized guest (though I would advise against it), my suggestion is to use a nonblocking socket on the host side and have your guest (listening) agent ack client connections via some kind of message. Then, at the host end, immediately after connect <port>, you could poll your socket for read events (EPOLLIN) to check on the success status of your connection (i.e. a 0-length read means your connection was refused, while an ack message from your agent indicates a successful connection).

@devimc
Copy link
Author

devimc commented Sep 13, 2019

@dhrgit thanks for explaining.
Closing issue since we found a solution that may work for kata, thanks again. 👍

@devimc devimc closed this as completed Sep 13, 2019
@dhrgit dhrgit mentioned this issue Dec 17, 2019
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants