Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccruntime e2e test nightly - unstable #339

Closed
wainersm opened this issue Jan 24, 2024 · 1 comment
Closed

ccruntime e2e test nightly - unstable #339

wainersm opened this issue Jan 24, 2024 · 1 comment
Assignees
Labels

Comments

@wainersm
Copy link
Member

ccruntime e2e test nightly jobs are pretty unstable, latest 5 out of 9 failed.

They aren't failing due same reason. For example:

DEBUG: Pod: cc-operator-controller-manager-ccbbcfdf7-vtk82, Container: kube-rbac-proxy, Restart count: 0
DEBUG: Pod: manager, Container: 3, Restart count: 
DEBUG: Pod: cc-operator-daemon-install-2v5xd, Container: cc-runtime-install-pod, Restart count: 0
DEBUG: Pod: cc-operator-pre-install-daemon-hpgqq, Container: cc-runtime-pre-install-pod, Restart count: 0
INFO: No new restarts in 3x21s, proceeding...
INFO: Run tests
INFO: Running operator tests for kata-clh
1..2
Error: The action has timed out.

In another job:

~/actions-runner/_work/operator/operator/install/pre-install-payload
ccruntime.confidentialcontainers.org/ccruntime-sample created
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
ERROR: runtimeclass kata-qemu is not up
INFO: Uninstall the operator
ccruntime.confidentialcontainers.org "ccruntime-sample" deleted
ERROR: there are labels left behind
{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","cc-preinstall/done":"true","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"garm-rolg2ncd0m","kubernetes.io/os":"linux","node-role.kubernetes.io/control-plane":"","node.kubernetes.io/exclude-from-external-load-balancers":"","node.kubernetes.io/worker":""}INFO: Shutdown the cluster
@ldoktor
Copy link
Contributor

ldoktor commented Jan 25, 2024

I managed to reproduce the ERROR: there are labels left behind while running:

kcli create vm -i ubuntu2204 -P memory=8G -P numcpus=4 -P disks=[50] e2e
kcli ssh e2e
git clone --depth=1 https://github.com/confidential-containers/operator
cd operator/tests/e2e
export PATH="$PATH:/usr/local/bin"
ansible-playbook -i localhost, -c local --tags untagged ansible/main.yml
sudo -E PATH="$PATH" bash -c './cluster/up.sh'
export KUBECONFIG=/etc/kubernetes/admin.conf

followed by a loop:

export "PATH=$PATH:/usr/local/bin"
export KUBECONFIG=/etc/kubernetes/admin.conf

UP=0
TEST=0
DOWN=0

I=0
while :; do
    echo "---< START ITERATION $I: $(date) >--" | tee -a job.log; SECONDS=0
    sudo -E PATH="$PATH" timeout 25m bash -c './operator.sh' || { date; exit -1; }
    UP="$SECONDS"; SECONDS=0; echo "UP    $(date) ($UP)" | tee -a job.log
    sudo -E PATH="$PATH" timeout 25m bash -c ./tests_runner.sh -r kata-qemu || { date; exit -2; }
    TEST="$SECONDS"; SECONDS=0; echo "TESTS $(date) ($TEST)" | tee -a job.log
    sudo -E PATH="$PATH" timeout 25m bash -c './operator.sh uninstall' || { date; exit -3; }
    DOWN="$SECONDS"; SECONDS=0; echo "DOWN  $(date) ($TEST)" | tee -a job.log
    echo -e "---< END ITERATION $I: $(date) ($UP\t$TEST\t$DOWN)\t[$((UP+TEST+DOWN))] >---" | tee -a job.log
    ((I+=1))
done

Which resulted in the left-behind labels. Interesting is that the operator stayed installed and the cc-operator-pre-install-daemon container, which might indicate the first grep (around line 171) finished before it started. Just an assumption at this point, I'm adding some debugs and will re-try that.

ldoktor added a commit to ldoktor/coco-operator that referenced this issue Jan 30, 2024
recent issues in CI indicate that kubectl might sometimes fail which
results in wait_for_process interrupting the loop. Let's improve the
command to ensure kubectl command passed and only then grep for the
(un)expected output.

Note the positive commands do not need to be treated as the output
should not contain the pod names on failure.

Fixes: confidential-containers#339

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
ldoktor added a commit to ldoktor/coco-operator that referenced this issue Jan 30, 2024
recent issues in CI indicate that kubectl might sometimes fail which
results in wait_for_process interrupting the loop. Let's improve the
command to ensure kubectl command passed and only then grep for the
(un)expected output.

Note the positive commands do not need to be treated as the output
should not contain the pod names on failure.

Fixes: confidential-containers#339

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
ldoktor added a commit to ldoktor/coco-operator that referenced this issue Jan 31, 2024
the network in CI environment tends to break from time to time, let's
allow up to 3 retries for tasks that support it and that use external
sources.

Fixes: confidential-containers#339

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
ldoktor added a commit to ldoktor/coco-operator that referenced this issue Jan 31, 2024
the network in CI environment tends to break from time to time, let's
allow up to 3 retries for tasks that support it and that use external
sources.

Fixes: confidential-containers#339

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
ldoktor added a commit to ldoktor/coco-operator that referenced this issue Feb 5, 2024
the network in CI environment tends to break from time to time, let's
allow up to 3 retries for tasks that support it and that use external
sources.

Fixes: confidential-containers#339

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
ldoktor added a commit to ldoktor/coco-operator that referenced this issue Feb 16, 2024
the network in CI environment tends to break from time to time, let's
allow up to 3 retries for tasks that support it and that use external
sources.

Fixes: confidential-containers#339

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants