Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up hack for Crio-1.17.3+ bug #8861

Closed
medyagh opened this issue Jul 28, 2020 · 6 comments · Fixed by #9768
Closed

Clean up hack for Crio-1.17.3+ bug #8861

medyagh opened this issue Jul 28, 2020 · 6 comments · Fixed by #9768
Assignees
Labels
co/docker-driver Issues related to kubernetes in container co/kic-base co/runtime/crio CRIO related issues kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@medyagh
Copy link
Member

medyagh commented Jul 28, 2020

While I was trying to rebuild our current Kicbase image without ANY change, I noticed All the crio tests fail
#8854

Failed Test Duration
TestStartStop/group/crio/serial/SecondStart 762.94
TestStartStop/group/crio/serial/UserAppExistsAfterStop 541.09
TestStartStop/group/crio/serial/AddonExistsAfterStop 541.04
TestStartStop/group/crio/serial/Pause

attached is the full test log in text
testout.txt

created an issue in the cri-o repo cri-o/cri-o#4027 unfortunately the 1.17.3 package has vanished from the repo magically without any notice !

CC: @afbjorklund

Example of error of TestStartStop/group/crio/serial/SecondStart

Unfortunately, an error has occurred:
		timed out waiting for the condition
	
	This error is likely caused by:
		- The kubelet is not running
		- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
	
	If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
		- 'systemctl status kubelet'
		- 'journalctl -xeu kubelet'
	
	Additionally, a control plane component may have crashed or exited when started by the container runtime.
	To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
	Here is one example how you may list all Kubernetes containers running in docker:
		- 'docker ps -a | grep kube | grep -v pause'
		Once you have found the failing container, you can inspect its logs with:
		- 'docker logs CONTAINERID'
	
	stderr:
		[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
		[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: FATAL: Module configs not found in directory /lib/modules/4.9.0-13-amd64\n", err: exit status 1
		[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
	error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
	
	run
	k8s.io/minikube/pkg/minikube/bootstrapper/kubeadm.(*Bootstrapper).init
		/home/jenkins/actions-runner/_work/minikube/minikube/pkg/minikube/bootstrapper/kubeadm/kubeadm.go:240
@medyagh medyagh added co/kic-base co/docker-driver Issues related to kubernetes in container co/runtime/crio CRIO related issues kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jul 28, 2020
@tstromberg
Copy link
Contributor

Related: #8890

@sharifelgamal
Copy link
Collaborator

Upstream 1.17.4 behavior issue: cri-o/cri-o#4035

@medyagh medyagh changed the title CRIO package 1.17.3 is not longer available and 1.17.4 breaks minikube Clean up hack for Crio-1.17.3+ hack Jul 31, 2020
@medyagh medyagh changed the title Clean up hack for Crio-1.17.3+ hack Clean up hack for Crio-1.17.3+ bug Jul 31, 2020
@medyagh
Copy link
Member Author

medyagh commented Jul 31, 2020

we ended up making a Hack to restart CRIO each time, this is a temproary hack till CRIO upstream is fixed cri-o/cri-o#4027

@afbjorklund
Copy link
Collaborator

we ended up making a Hack to restart CRIO each time, this is a temproary hack till CRIO upstream is fixed cri-o/cri-o#4027

The actual bug for cri-o is cri-o/cri-o#4035, the other issue is just about the deletion of the 1.17.3 package

It can still be rebuilt from the OBS sources, but by the time it was made available you had already upgraded to 1.18.3 instead

https://build.opensuse.org/package/show/devel:kubic:libcontainers:stable:cri-o:1.17:1.17.3/cri-o

Seems like the CRI-O project have changed from one repository to one subproject per release, so you can pin versions now...

https://build.opensuse.org/project/subprojects/devel:kubic:libcontainers:stable:cri-o

@sharifelgamal sharifelgamal added this to the v1.14.0 milestone Sep 16, 2020
@medyagh medyagh modified the milestones: v1.14.0, v1.15.0-candidate Oct 12, 2020
@sharifelgamal sharifelgamal self-assigned this Oct 27, 2020
@sharifelgamal
Copy link
Collaborator

I have a PR open (#9501) that reverts this change, but the crio tests are still failing in a way that concerns me, so I need to do some more investigation. Do we need to update the version of crio we're using?

@afbjorklund
Copy link
Collaborator

I think the bug is fixed in PR #9629 (by upgrading runc version) - if that is correct we can do the suggested revert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/docker-driver Issues related to kubernetes in container co/kic-base co/runtime/crio CRIO related issues kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants