Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

FC: Update Firecracker to v0.20.0 #2379

Merged
merged 2 commits into from
Feb 7, 2020

Conversation

Pennyzct
Copy link
Contributor

@Pennyzct Pennyzct commented Dec 27, 2019

The new release for Firecracker is v0.20.0.
There are a few related updates:

Fixes: #2378

Signed-off-by: Penny Zheng penny.zheng@arm.com

@Pennyzct Pennyzct added the do-not-merge PR has problems or depends on another label Dec 27, 2019
@Pennyzct
Copy link
Contributor Author

Pennyzct commented Dec 27, 2019

It depends on kata-containers/agent#706. And When it landed, we need to update vendor/ to officially include the changes.

Copy link
Contributor

@grahamwhaley grahamwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
So, how do we co-ordinate the update of this, the vendor, and the agent connection check? :-)

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Jan 6, 2020

Hi @grahamwhaley
FWIT, The correct order should be the agent vsock handshake landed firstly, then we could do vendor update (I'll add another commit here to do it) and FC version bump. ;).

@devimc
Copy link

devimc commented Jan 6, 2020

@Pennyzct add Depends-on: github.com/kata-containers/agent#706 in your commit message if you want to test both changes

Pennyzct added a commit to Pennyzct/agent that referenced this pull request Jan 8, 2020
This file is mainly used by kata-runtime and missing logging part.
We add one new "agent-client" log field here, and Plz notice that,
you should turn on the `kata-runtime` debug option to see the output.

Fixes: kata-containers#705
Depends-on: github.com/kata-containers/runtime#2379

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
@Pennyzct
Copy link
Contributor Author

Pennyzct commented Jan 9, 2020

/test-fc

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Jan 9, 2020

Hi @devimc
only I included changes in github.com/kata-container/agent#706 under vendor/(it's only for debugging), you could finally see that PR working.

@Pennyzct
Copy link
Contributor Author

/test-fc

Pennyzct added a commit to Pennyzct/agent that referenced this pull request Jan 10, 2020
This file is mainly used by kata-runtime and missing logging part.
We add one new "agent-client" log field here, and Plz notice that,
you should turn on the `kata-runtime` debug option to see the output.

Fixes: kata-containers#705
Depends-on: github.com/kata-containers/runtime#2379

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
@Pennyzct
Copy link
Contributor Author

/test-fc

Pennyzct added a commit to Pennyzct/agent that referenced this pull request Jan 10, 2020
This file is mainly used by kata-runtime and missing logging part.
We add one new "agent-client" log field here, and Plz notice that,
you should turn on the `kata-runtime` debug option to see the output.

Fixes: kata-containers#705
Depends-on: github.com/kata-containers/runtime#2379

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
@Pennyzct
Copy link
Contributor Author

/test-fc

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Jan 10, 2020

Getting error:

not ok 15 ctr partial line logging
# (in test file ctr.bats, line 540)
#   `[ "$status" -eq 0 ]' failed
# time="2020-01-10T06:10:23Z" level=error msg="File descriptor 3 (pipe:[3154948]) leaked on 
pvdisplay invocation. Parent PID 4437: /tmp/jenkins/workspace/kata-con\nFile descriptor 6 
(/dev/mapper/control) leaked on pvdisplay invocation. Parent PID 4437: 
/tmp/jenkins/workspace/kata-con\n  Failed to find physical volume \"/dev/sdb\".\n" error="exit status 5"

Restart to see if it's reproducible.
Hey guys
@chavafg @grahamwhaley @jodh-intel @devimc @bergwolf @gnawux @lifupan
I've done some digging, and looks like that this error was reported before by @egernst.
I once encountered similar error in one of my arm local machine, and it's because that even if the /dev/sdb entry was there, the real physical volume was never mounted.
Should we need to get into this FC CI machine to lsblk see if it really exists?? 🤯

@Pennyzct
Copy link
Contributor Author

@teawater

@chavafg
Copy link
Contributor

chavafg commented Jan 10, 2020

hi @Pennyzct,
The machine has an sdb device (otherwise, all other cri-o tests would've failed), but that line always appears when there is an error running a cri-o test. The real issue with the test can be seen below:

# time="2020-01-10 06:11:14.831287468Z" level=debug msg="response error: container create failed: Failed to check if grpc server is working: context deadline exceeded\n" file="v1alpha2/api.pb.go:7618" id=e5dc1494-904c-409b-8b04-52529e76c7c9
# time="2020-01-10T06:11:14Z" level=fatal msg="run pod sandbox failed: rpc error: code = Unknown desc = container create failed: Failed to check if grpc server is working: context deadline exceeded\n"
# time="2020-01-10 06:11:14.844093026Z" level=debug msg="request: &ListContainersRequest{Filter:&ContainerFilter{Id:,State:&ContainerStateValue{State:CONTAINER_RUNNING,},PodSandboxId:,LabelSelector:map[string]string{},},}" file="v1alpha2/api.pb.go:7780" id=a4ebbf78-d5f9-46cb-ace5-c59c4d9dd4a2

taken from http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-1804-PR-firecracker/1434/console

I see that the newest CI execution shows that another test failed:

Checking 12 containers have all relevant components
Run kata-runtime: nginx:1.17.0-alpine: 
fa8a9d3a448d25fa1d54932d255368bb1e9ff60a140aa2aaf820fd1f2f961026
docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown.
Checking 13 containers have all relevant components
Wrong number of shims running (13 != 12) - stopping
Wrong number of firecracker running (13 != 12) - stopping
Wrong number of 'runtime list' containers running (13 != 12) - stopping
Wrong number of pods in /run/vc/sbs (13 != 12) - stopping)

seems like same grpc issue on both tests, but I am not sure if this is related with your changes.

@Pennyzct
Copy link
Contributor Author

@chavafg Hi~
Thanks for the checking and explanation~~~
I've encountered this grpc connection error several times, mostly in this parallel container test:
See one in my another fc-irrelevant PR #2376
http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-1804-PR-firecracker/1427/

docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown.
Checking 4 containers have all relevant components
Wrong number of shims running (4 != 3) - stopping
Wrong number of firecracker running (4 != 3) - stopping
Wrong number of 'runtime list' containers running (4 != 3) - stopping
Wrong number of pods in /run/vc/sbs (4 != 3) - stopping)

@chavafg
Copy link
Contributor

chavafg commented Jan 13, 2020

Seems like we are having this failure randomly. I see most jobs of the firecracker CI are constantly failing. Any idea @devimc ?

@devimc
Copy link

devimc commented Jan 13, 2020

@chavafg no idea, I think @teawater has played with vsock implementation recently

@WeiZhang555
Copy link
Member

@chavafg @devimc The FC job is continually failing randomly, I've encountered the same problem: #2239

@jcvenegas
Copy link
Member

@Pennyzct could you upgrade your client code to use the latest changes ? we have reverted some changes and now looks more stable the CI

@Pennyzct
Copy link
Contributor Author

Hi~ @jcvenegas
I've deleted one DEBUG commit and re-based the code to the latest. ;).

@Pennyzct
Copy link
Contributor Author

/test

@codecov
Copy link

codecov bot commented Jan 28, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@bd7d310). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master    #2379   +/-   ##
=========================================
  Coverage          ?   50.62%           
=========================================
  Files             ?      112           
  Lines             ?    16274           
  Branches          ?        0           
=========================================
  Hits              ?     8239           
  Misses            ?     7020           
  Partials          ?     1015

Pennyzct added a commit to Pennyzct/agent that referenced this pull request Jan 30, 2020
This file is mainly used by kata-runtime and missing logging part.
We add one new "agent-client" log field here, and Plz notice that,
you should turn on the `kata-runtime` debug option to see the output.

Fixes: kata-containers#705
Depends-on: github.com/kata-containers/runtime#2379

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
@Pennyzct
Copy link
Contributor Author

/test-fc

@Pennyzct Pennyzct removed the do-not-merge PR has problems or depends on another label Jan 31, 2020
@Pennyzct
Copy link
Contributor Author

/test

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Feb 5, 2020

/test-fc

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Feb 5, 2020

/test

@jcvenegas
Copy link
Member

@Pennyzct I see fc passing now is it ready to merge? cloud hypervisor will fail that until we bump it as well and depends on this PR. we can bump after merge this PR or include the change in this PR, c8741b0

@jcvenegas
Copy link
Member

jcvenegas commented Feb 6, 2020

#2440 includes the bump from here (and clh) and point to latest agent client code, both ARM CI and cloud hypervisor are running initrd job seems that is broken

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Feb 6, 2020

Hi~ @jcvenegas
It's ready for merging.;). ARM CI's failure is irrelevant. I've already raised an issue to elaborate, see kata-containers/ci#245

@jcvenegas
Copy link
Member

@Pennyzct woops we need rease

@jcvenegas
Copy link
Member

Or merge 2440

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Feb 7, 2020

Hi~ @jcvenegas no problem, I'll do the rebase asap.;).

The new release for Firecracker is `v0.20.0`.

Fixes: kata-containers#2378

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
We need to include changes in PR github.com/kata-containers/agent#706
(kata-containers/agent#706, to use
the new vsock-trivial-handshake scheme implemented in FC v0.20.0.

Fixes: kata-containers#2378

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
@Pennyzct
Copy link
Contributor Author

Pennyzct commented Feb 7, 2020

/test-fc

@jcvenegas
Copy link
Member

/test-ubuntu

@jcvenegas jcvenegas merged commit b444393 into kata-containers:master Feb 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FC: Update Firecracker to v0.20.0
7 participants