Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

flaky volume e2e test when using docker + CNI #679

Closed
darkowlzz opened this issue Aug 31, 2020 · 2 comments · Fixed by #773
Closed

flaky volume e2e test when using docker + CNI #679

darkowlzz opened this issue Aug 31, 2020 · 2 comments · Fixed by #773
Labels
area/testing Issues related to improving testing

Comments

@darkowlzz
Copy link
Contributor

The volume e2e test is observed to fail frequently with Docker + CNI :

=== RUN   TestVolumeWithDockerAndCNI

    TestVolumeWithDockerAndCNI: run_volume_test.go:120: assertion failed: error is not nil: exit status 1: vm stop: 

        ["/home/travis/build/weaveworks/ignite/bin/ignite" "--runtime=docker" "--network-plugin=cni" "stop" "e2e_test_volume_docker_and_cni"]

        time="2020-08-31T04:45:29Z" level=info msg="Removing the container with ID \"ignite-d54e77eeaa408deb\" from the \"cni\" network"

        time="2020-08-31T04:45:31Z" level=fatal msg="failed to stop container for VM \"d54e77eeaa408deb\": Error response from daemon: No such container: ignite-d54e77eeaa408deb"

It passes on restarting the test, mostly.

Docker + CNI never worked on my development machine (ubuntu 18.04) but it works on TravisCI for all of the other e2e tests.

@darkowlzz darkowlzz added the area/testing Issues related to improving testing label Aug 31, 2020
@bboreham
Copy link
Contributor

Similar failure in TestVMLifecycleWithDockerAndDockerBridge https://travis-ci.com/github/weaveworks/ignite/builds/210384312

@bboreham
Copy link
Contributor

bboreham commented Jan 14, 2021

I drilled into this a bit on SemaphoreCI - can't get the test to fail reliably on my own machine.
It appears to be a race inside StopContainer() whereby the container is removed before the operation completes.
Container removal is expected since Ignite sets AutoRemove (unless debug is on).

Evidence: I ran docker events in background and manually executed the ignite commands from the test:

semaphore@semaphore-vm:~/ignite$ sudo bin/ignite stop e2e-test-vm-lifecycle-docker-and-cni
INFO[0000] Removing the container with ID "ignite-fb4ced5044c57fb1" from the "cni" network 
2021-01-14T18:41:42.873403421Z container kill 691a608a71393439c6c38663730f5d86e9d1c8a538ceaff80a8b898319f9ee21 (ignite.name=e2e-test-vm-lifecycle-docker-and-cni, image=docker.io/weaveworks/ignite:dev, name=ignite-fb4ced5044c57fb1, signal=15)
2021-01-14T18:41:43.848607968Z container die 691a608a71393439c6c38663730f5d86e9d1c8a538ceaff80a8b898319f9ee21 (exitCode=0, ignite.name=e2e-test-vm-lifecycle-docker-and-cni, image=docker.io/weaveworks/ignite:dev, name=ignite-fb4ced5044c57fb1)
2021-01-14T18:41:43.872832782Z network disconnect 198f2c7dac9ba506d6c4360e9b229d0d4b0e5c726554d8ac29fba99c5e30fe88 (container=691a608a71393439c6c38663730f5d86e9d1c8a538ceaff80a8b898319f9ee21, name=none, type=null)
2021-01-14T18:41:43.905673616Z container stop 691a608a71393439c6c38663730f5d86e9d1c8a538ceaff80a8b898319f9ee21 (ignite.name=e2e-test-vm-lifecycle-docker-and-cni, image=docker.io/weaveworks/ignite:dev, name=ignite-fb4ced5044c57fb1)
2021-01-14T18:41:43.905722684Z container destroy 691a608a71393439c6c38663730f5d86e9d1c8a538ceaff80a8b898319f9ee21 (ignite.name=e2e-test-vm-lifecycle-docker-and-cni, image=docker.io/weaveworks/ignite:dev, name=ignite-fb4ced5044c57fb1)
FATA[0001] failed to stop container for VM "fb4ced5044c57fb1": Error response from daemon: No such container: ignite-fb4ced5044c57fb1 

here's a repeat that didn't fail, and the destroy event comes after the return from Docker:

semaphore@semaphore-vm:~/ignite$ sudo /home/semaphore/ignite/bin/ignite stop e2e-test-vm-lifecycle-docker-and-cni
INFO[0000] Removing the container with ID "ignite-63e1b2fa6d955e84" from the "cni" network 
2021-01-14T18:44:08.914491738Z container kill 868705b66959b68ca673f0ad5a00a2870092c68ef83af856340526d1fade043d (ignite.name=e2e-test-vm-lifecycle-docker-and-cni, image=docker.io/weaveworks/ignite:dev, name=ignite-63e1b2fa6d955e84, signal=15)
2021-01-14T18:44:09.791012870Z container die 868705b66959b68ca673f0ad5a00a2870092c68ef83af856340526d1fade043d (exitCode=0, ignite.name=e2e-test-vm-lifecycle-docker-and-cni, image=docker.io/weaveworks/ignite:dev, name=ignite-63e1b2fa6d955e84)
2021-01-14T18:44:09.812864133Z network disconnect 198f2c7dac9ba506d6c4360e9b229d0d4b0e5c726554d8ac29fba99c5e30fe88 (container=868705b66959b68ca673f0ad5a00a2870092c68ef83af856340526d1fade043d, name=none, type=null)
2021-01-14T18:44:09.851682217Z container stop 868705b66959b68ca673f0ad5a00a2870092c68ef83af856340526d1fade043d (ignite.name=e2e-test-vm-lifecycle-docker-and-cni, image=docker.io/weaveworks/ignite:dev, name=ignite-63e1b2fa6d955e84)
INFO[0001] Stopped VM with name "e2e-test-vm-lifecycle-docker-and-cni" and ID "63e1b2fa6d955e84" 
2021-01-14T18:44:09.853365276Z container destroy 868705b66959b68ca673f0ad5a00a2870092c68ef83af856340526d1fade043d (ignite.name=e2e-test-vm-lifecycle-docker-and-cni, image=docker.io/weaveworks/ignite:dev, name=ignite-63e1b2fa6d955e84)

So we could perhaps work round this by stopping using AutoRemove, implementing that inside Ignite.

bboreham added a commit that referenced this issue Jan 15, 2021
Work around #679, but leaves
dead containers lying around after test.
bboreham added a commit that referenced this issue Jan 15, 2021
Work around #679,
which appears to be caused by a race condition inside Docker.

(This change leaves a few dead containers lying around after the test).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/testing Issues related to improving testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants