Skip to content

Version upgrade test is failing (packet_debian10-calico-upgrade) #8984

@cristicalin

Description

@cristicalin

Which jobs are failing:
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/builds/2593776759
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/builds/2593387999
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/builds/2596644984

Which test(s) are failing:
packet_debian10-calico-upgrade

Since when has it been failing:
Probably since #8935

Testgrid link:

Reason for failure:
The netchecker agent pods go into Error state and the netchecker server pod goes into completed state.
The cri-dockerd logs CNI errors until the pods are deleted. After the errored pods are replaced the replacement pods start correctly and cri-dockerd stops logging errors.

Anything else we need to know:
The intent of the upgrade job was to test upgrades with default settings from one major kubespray version to another. Currently this is no longer the case because in #8175 I hard coded docker as the tested container engine. With the transition from dockershim to cri-dockerd it looks like we may have an issue with the transition leading to this test failure.

I see the following solution to this:

  • implement per-engine upgrade tests (containerd, docker, cri-o) and move them to the nightly job and allow them to fail while working on proper solutions for all 3
  • move the PR job to use the defaults (remove hardcoding and default to containerd which is something i tested in my local lab) to unblock PRs

List of currently affected PRs (that I'm aware of): #8980 #8979 #8978

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/failing-testCategorizes issue or PR as related to a consistently or frequently failing test.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions