-
Notifications
You must be signed in to change notification settings - Fork 113
FC: Add new vsock connection handshake #706
Conversation
Codecov Report
@@ Coverage Diff @@
## master #706 +/- ##
==========================================
- Coverage 60.82% 60.71% -0.12%
==========================================
Files 17 17
Lines 2550 2558 +8
==========================================
+ Hits 1551 1553 +2
- Misses 850 857 +7
+ Partials 149 148 -1 |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Pennyzct.
lgtm
protocols/client/client.go
Outdated
return nil, err | ||
} else if !strings.Contains(response, "OK") { | ||
conn.Close() | ||
return nil, errors.New("malformed response code") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make sense to log response
here if debug is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi~ @jodh-intel yep. And since this file is missing logging part, I've added another commit to fulfill it. ;).
806dc6f
to
b7d6ae5
Compare
Hi guys |
@Pennyzct thanks, but please add |
b7d6ae5
to
29d5a2b
Compare
Hi~ @devimc |
/test |
@Pennyzct thanks, now I can see that fc 0.20 is installed
unfortunately FC CI is failing
|
Add changes from kata-containers/agent#706. THIS IS ONLY FOR DEBUGGING! Fixes: kata-containers#2378 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
The new release for Firecracker is `v0.20.0`. Fixes: kata-containers#2378 Depends-on: github.com/kata-containers/agent#706 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Add changes from kata-containers/agent#706. THIS IS ONLY FOR DEBUGGING! Fixes: kata-containers#2378 Depends-on: github.com/kata-containers/agent#706 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Add changes from kata-containers/agent#706. THIS IS ONLY FOR DEBUGGING! Signed-off-by: Penny Zheng <penny.zheng@arm.com>
adad4ca
to
9b10b2e
Compare
Add changes from kata-containers/agent#706. THIS IS ONLY FOR DEBUGGING! Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Updates:
|
@Pennyzct good catch! |
/test |
LGTM @jcvenegas @likebreath I think the right move here is to make sure you update Kata to support latest CH from master. This way, I think the random failures will go away, as the PR is effectively reverting the 3 patches we've been reverting so far (through our testing). @devimc I think you need to bump FC to 0.20.0 so that it does not get broken when this PR is merged. |
@sboeuf what do you mean? CI installs firecracker 0.20 and is failing
|
@Pennyzct the PR is failing with latest FC 0.20. Are you still trying to make this pass? |
protocols/client/client.go
Outdated
response, err := reader.ReadString('\n') | ||
if err != nil { | ||
conn.Close() | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try removing this line, this way the function will return conn, nil
and the GRPC code will handle dialing again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, if reading was failing, we should return error instantly. Then here in func commonDialer
go func() {
for {
select {
case <-cancel:
// canceled or channel closed
return
default:
}
conn, err := dialFunc()
if err == nil {
// Send conn back iff timer is not fired
// Otherwise there might be no one left reading it
if t.Stop() {
ch <- conn
} else {
conn.Close()
}
return
}
}
}()
we could re-connect(conn, err := dialFunc()
) again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, but this quick re-connection performed by commonDialer()
tend to get things to fail. Instead, when relying on the backoff strategy from GRPC (which happens when we return conn, nil
), things are more stable.
Just to clarify, I'm not saying this is a long term solution, but while we look actively for the root cause, relying on the backoff strategy from GRPC gives us a more stable CI for both Firecracker and Cloud-Hypervisor.
protocols/client/client.go
Outdated
return nil, err | ||
} else if !strings.Contains(response, "OK") { | ||
conn.Close() | ||
return nil, errors.New("malformed response code") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try removing this line, this way the function will return conn, nil
and the GRPC code will handle dialing again.
The new release for Firecracker is `v0.20.0`. Fixes: kata-containers#2378 Depends-on: github.com/kata-containers/agent#706 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
… IS ONLY FOR DEBUGGING! Signed-off-by: Penny Zheng <penny.zheng@arm.com>
@Pennyzct ping could you rebase this PR? |
New Firecracker v0.20.0 has changed the host-initiated vsock connection protocol to include a trivial handshake. The new protocol looks like this: - [host] CONNECT <port><LF> - [guest/success] OK <assigned_host_port><LF> Fixes: kata-containers#705 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
This file is mainly used by kata-runtime and missing logging part. We add one new "agent-client" log field here, and Plz notice that, you should turn on the `kata-runtime` debug option to see the output. Fixes: kata-containers#705 Depends-on: github.com/kata-containers/runtime#2379 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
9b10b2e
to
accab34
Compare
Hi~ guys |
/test |
Here, We import changes from PR kata-containers/agent#706 to see if it's really working. Fixes: kata-containers#2378 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
We need to include changes in PR github.com/kata-containers/agent#706 (kata-containers/agent#706, to use the new vsock-trivial-handshake scheme implemented in FC v0.20.0. Fixes: kata-containers#2378 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
We need to include changes in PR github.com/kata-containers/agent#706 (kata-containers/agent#706, to use the new vsock-trivial-handshake scheme implemented in FC v0.20.0. Fixes: kata-containers#2378 Depends-on: kata-containers#2431 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
We need to include changes in PR github.com/kata-containers/agent#706 (kata-containers/agent#706, to use the new vsock-trivial-handshake scheme implemented in FC v0.20.0. Fixes: kata-containers#2378 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
We need to include changes in PR github.com/kata-containers/agent#706 (kata-containers/agent#706, to use the new vsock-trivial-handshake scheme implemented in FC v0.20.0. Fixes: kata-containers#2378 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
We need to include changes in PR github.com/kata-containers/agent#706 (kata-containers/agent#706, to use the new vsock-trivial-handshake scheme implemented in FC v0.20.0. Fixes: kata-containers#2378 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
We need to include changes in PR github.com/kata-containers/agent#706 (kata-containers/agent#706, to use the new vsock-trivial-handshake scheme implemented in FC v0.20.0. Fixes: kata-containers#2378 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Which feature do you think can be improved?
New Firecracker
v0.20.0
has changed the host-initiated vsock connection protocol to include a trivial handshake.The new protocol looks like this:
See PR firecracker-microvm/firecracker#1472 for detailed info.