-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Description
Which jobs are failing:
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/builds/2593776759
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/builds/2593387999
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/builds/2596644984
Which test(s) are failing:
packet_debian10-calico-upgrade
Since when has it been failing:
Probably since #8935
Testgrid link:
Reason for failure:
The netchecker agent pods go into Error state and the netchecker server pod goes into completed state.
The cri-dockerd logs CNI errors until the pods are deleted. After the errored pods are replaced the replacement pods start correctly and cri-dockerd stops logging errors.
Anything else we need to know:
The intent of the upgrade job was to test upgrades with default settings from one major kubespray version to another. Currently this is no longer the case because in #8175 I hard coded docker as the tested container engine. With the transition from dockershim to cri-dockerd it looks like we may have an issue with the transition leading to this test failure.
I see the following solution to this:
- implement per-engine upgrade tests (containerd, docker, cri-o) and move them to the nightly job and allow them to fail while working on proper solutions for all 3
- move the PR job to use the defaults (remove hardcoding and default to containerd which is something i tested in my local lab) to unblock PRs
List of currently affected PRs (that I'm aware of): #8980 #8979 #8978