Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Unit and end-to-end tests are flakey #2672

Closed
2opremio opened this issue Dec 10, 2019 · 34 comments
Closed

Unit and end-to-end tests are flakey #2672

2opremio opened this issue Dec 10, 2019 · 34 comments
Assignees
Labels
build About the build or test scaffolding

Comments

@2opremio
Copy link
Contributor

They fail from time to time. e.g. https://circleci.com/gh/fluxcd/flux/8957

@hiddeco @squaremo @stefanprodan please add more cases here (if you see them) so that I can fix them.

@2opremio 2opremio self-assigned this Dec 10, 2019
@sa-spag
Copy link
Contributor

sa-spag commented Dec 10, 2019

I had a case on #2668: https://circleci.com/gh/fluxcd/flux/8904.

@2opremio
Copy link
Contributor Author

Thanks

@2opremio 2opremio changed the title End-to-end tests are flakey Unit and end-to-end tests are flakey Dec 10, 2019
@2opremio
Copy link
Contributor Author

2opremio commented Dec 10, 2019

It turns out unit tests are also flakey. Three instances: https://circleci.com/gh/fluxcd/flux/8957 , https://circleci.com/gh/fluxcd/flux/8964 and https://circleci.com/gh/fluxcd/flux/8966

I believe this started happening as a result of adding two more cores to CircleCI (see #2647 )

@squaremo
Copy link
Member

https://circleci.com/gh/fluxcd/flux/8984 is a PR rebased on #2674 and still fails to pass unit tests.

@2opremio
Copy link
Contributor Author

https://circleci.com/gh/fluxcd/flux/8984 is a PR rebased on #2674 and still fails to pass unit tests.

Without running them in parallel at least we get a simpler backtrace ...

backtrace ``` goroutine 274 [running]: testing.(*M).startAlarm.func1() /usr/local/go/src/testing/testing.go:1377 +0x11c created by time.goFunc /usr/local/go/src/time/sleep.go:168 +0x52

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000167000, 0x1cc8ae5, 0x1c, 0x1d507f0, 0x1)
/usr/local/go/src/testing/testing.go:961 +0x68a
testing.runTests.func1(0xc000167000)
/usr/local/go/src/testing/testing.go:1202 +0xa7
testing.tRunner(0xc000167000, 0xc0004d9d40)
/usr/local/go/src/testing/testing.go:909 +0x19a
testing.runTests(0xc0003cb480, 0x2aad4e0, 0x14, 0x14, 0x0)
/usr/local/go/src/testing/testing.go:1200 +0x522
testing.(*M).Run(0xc0003c9e80, 0x0)
/usr/local/go/src/testing/testing.go:1117 +0x300
main.main()
_testmain.go:82 +0x224

goroutine 19 [chan receive]:
k8s.io/klog.(*loggingT).flushDaemon(0x2abc300)
/home/circleci/go/pkg/mod/k8s.io/klog@v0.3.3/klog.go:990 +0xae
created by k8s.io/klog.init.0
/home/circleci/go/pkg/mod/k8s.io/klog@v0.3.3/klog.go:404 +0x9b

goroutine 193 [chan receive]:
testing.(*T).Run(0xc0004f3600, 0x1cb70df, 0xb, 0xc000242ea0, 0xc00012d001)
/usr/local/go/src/testing/testing.go:961 +0x68a
github.com/fluxcd/flux/pkg/cluster/kubernetes.TestWorkloadContainerUpdates(0xc0004f3600)
/home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/update_test.go:78 +0xad6
testing.tRunner(0xc0004f3600, 0x1d507f0)
/usr/local/go/src/testing/testing.go:909 +0x19a
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:960 +0x652

goroutine 220 [syscall]:
syscall.Syscall6(0xf7, 0x1, 0x66b1, 0xc0006e55f0, 0x1000004, 0x0, 0x0, 0xc0006e5618, 0x10, 0xc00008ea80)
/usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc000363680, 0x3, 0xc0001f9b80, 0xc0000be398)
/usr/local/go/src/os/wait_waitid.go:31 +0xbf
os.(*Process).wait(0xc000363680, 0xc0000eca50, 0x4274a0, 0xc0000eca40)
/usr/local/go/src/os/exec_unix.go:22 +0x6e
os.(*Process).Wait(...)
/usr/local/go/src/os/exec.go:125
os/exec.(*Cmd).Wait(0xc0000ec9a0, 0x0, 0x0)
/usr/local/go/src/os/exec/exec.go:501 +0xf3
os/exec.(*Cmd).Run(0xc0000ec9a0, 0x28, 0xc0006381c0)
/usr/local/go/src/os/exec/exec.go:341 +0x87
github.com/fluxcd/flux/pkg/cluster/kubernetes.execKubeyaml(0xc0006d4000, 0x345, 0x380, 0xc0006381c0, 0xb, 0xe, 0x7, 0xe, 0xc000072970, 0x54cdc7, ...)
/home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/kubeyaml.go:44 +0x26a
github.com/fluxcd/flux/pkg/cluster/kubernetes.KubeYAML.Image(0xc0006d4000, 0x345, 0x380, 0x1cc7d2f, 0x5, 0x1cc7d35, 0xa, 0x1cc7d40, 0xa, 0x1cb624d, ...)
/home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/kubeyaml.go:19 +0x2ca
github.com/fluxcd/flux/pkg/cluster/kubernetes.updateWorkloadContainer(0xc0006d4000, 0x345, 0x380, 0x1f9c8c0, 0xc0006b8b40, 0x1cb624d, 0xa, 0x1cdf8ea, 0x7, 0xc000363620, ...)
/home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/update.go:22 +0x452
github.com/fluxcd/flux/pkg/cluster/kubernetes.testUpdateWorkloadContainer(0xc000167100, 0x1cb70df, 0xb, 0x1cc7d2f, 0x1b, 0x2a9b840, 0x1, 0x1, 0x1cdf8ea, 0x2c, ...)
/home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/update_test.go:30 +0x3dc
github.com/fluxcd/flux/pkg/cluster/kubernetes.TestWorkloadContainerUpdates.func1(0xc000167100)
/home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/update_test.go:81 +0x1b1
testing.tRunner(0xc000167100, 0xc000242ea0)
/usr/local/go/src/testing/testing.go:909 +0x19a
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:960 +0x652

goroutine 223 [IO wait]:
internal/poll.runtime_pollWait(0x7f71b45fcbc8, 0x72, 0x1f9d1e0)
/usr/local/go/src/runtime/netpoll.go:184 +0x55
internal/poll.(*pollDesc).wait(0xc00033f518, 0x72, 0x201, 0x200, 0xffffffffffffffff)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0xe4
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc00033f500, 0xc00055f200, 0x200, 0x200, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:169 +0x253
os.(*File).read(...)
/usr/local/go/src/os/file_unix.go:259
os.(*File).Read(0xc0000be388, 0xc00055f200, 0x200, 0x200, 0x7f71b45fd920, 0xc00033f168, 0x0)
/usr/local/go/src/os/file.go:116 +0xa7
bytes.(*Buffer).ReadFrom(0xc0006b8ba0, 0x1f9c100, 0xc0000be388, 0x7f71b45fd920, 0xc0006b8ba0, 0xc0003ecf01)
/usr/local/go/src/bytes/buffer.go:204 +0x159
io.copyBuffer(0x1f9a940, 0xc0006b8ba0, 0x1f9c100, 0xc0000be388, 0x0, 0x0, 0x0, 0xc0003ecf90, 0x42da25, 0xc00033f2c0)
/usr/local/go/src/io/io.go:388 +0x3fb
io.Copy(...)
/usr/local/go/src/io/io.go:364
os/exec.(*Cmd).writerDescriptor.func1(0x489a01, 0x0)
/usr/local/go/src/os/exec/exec.go:311 +0x7b
os/exec.(*Cmd).Start.func1(0xc0000ec9a0, 0xc0000d39c0)
/usr/local/go/src/os/exec/exec.go:435 +0x35
created by os/exec.(*Cmd).Start
/usr/local/go/src/os/exec/exec.go:434 +0xa8e

goroutine 222 [IO wait]:
internal/poll.runtime_pollWait(0x7f71b45fcd68, 0x72, 0x1f9d1e0)
/usr/local/go/src/runtime/netpoll.go:184 +0x55
internal/poll.(*pollDesc).wait(0xc00033f458, 0x72, 0x201, 0x200, 0xffffffffffffffff)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0xe4
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc00033f440, 0xc00055f400, 0x200, 0x200, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:169 +0x253
os.(*File).read(...)
/usr/local/go/src/os/file_unix.go:259
os.(*File).Read(0xc0000be368, 0xc00055f400, 0x200, 0x200, 0x7f71b45fd920, 0xc00033f228, 0x0)
/usr/local/go/src/os/file.go:116 +0xa7
bytes.(*Buffer).ReadFrom(0xc0006b8b70, 0x1f9c100, 0xc0000be368, 0x7f71b45fd920, 0xc0006b8b70, 0xc0000fef01)
/usr/local/go/src/bytes/buffer.go:204 +0x159
io.copyBuffer(0x1f9a940, 0xc0006b8b70, 0x1f9c100, 0xc0000be368, 0x0, 0x0, 0x0, 0xc0000fef90, 0x42da25, 0xc00033f2c0)
/usr/local/go/src/io/io.go:388 +0x3fb
io.Copy(...)
/usr/local/go/src/io/io.go:364
os/exec.(*Cmd).writerDescriptor.func1(0x489a01, 0x0)
/usr/local/go/src/os/exec/exec.go:311 +0x7b
os/exec.(*Cmd).Start.func1(0xc0000ec9a0, 0xc0000d3980)
/usr/local/go/src/os/exec/exec.go:435 +0x35
created by os/exec.(*Cmd).Start

</details>

I think that either the docker command is getting stuck or we are somehow surpassing the limit after stopping the parallelization. I would go for the first one, since the test did pass at least once.

@2opremio
Copy link
Contributor Author

2opremio commented Dec 10, 2019

I think it may be docker getting stuck (or the test taking too long now that it's not parallelized)

@2opremio
Copy link
Contributor Author

2opremio commented Dec 10, 2019

Of course, as soon as I add debug printouts it doesn't fail anymore, grrrrr https://circleci.com/gh/fluxcd/flux/9004

@2opremio
Copy link
Contributor Author

I managed to make it fail https://circleci.com/gh/fluxcd/flux/9008

Running: common case
calling updateWorkloadContainer(manifest, extra:deployment/pr-assigner, pr-assigner, quay.io/weaveworks/pr-assigner:master-1234567)
Took: 1.1 seconds
Running: new version like number
calling updateWorkloadContainer(manifest, default:deployment/fluxy, fluxy, weaveworks/fluxy:1234567)
Took: 1.1 seconds
Running: old version like number
calling updateWorkloadContainer(manifest, default:deployment/fluxy, fluxy, weaveworks/fluxy:master-a000001)
Took: 1.0 seconds
Running: name label out of order
calling updateWorkloadContainer(manifest, monitoring:deployment/grafana, grafana, quay.io/weaveworks/grafana:master-37aaf67)
Took: 1.0 seconds
Running: version (tag) with dots
calling updateWorkloadContainer(manifest, sock-shop:deployment/front-end, front-end, weaveworksdemos/front-end:7f511af2d21fd601b86b3bed7baa6adfa9c8c669)
Took: 1.0 seconds
Running: minimal dockerhub image name
calling updateWorkloadContainer(manifest, default:deployment/nginx, nginx, nginx:1.10-alpine)
Took: 1.1 seconds
Running: reordered keys
calling updateWorkloadContainer(manifest, default:deployment/nginx, nginx, nginx:1.10-alpine)
calling updateWorkloadContainer(manifest, default:deployment/nginx, nginx2, nginx:1.10-alpine)
Took: 2.3 seconds
Running: from prod
calling updateWorkloadContainer(manifest, default:deployment/authfe, logging, quay.io/weaveworks/logging:master-123456)
Took: 1.1 seconds
Running: single quotes
calling updateWorkloadContainer(manifest, default:deployment/weave, weave, weaveworks/weave-kube:2.2.1)
Took: 1.0 seconds
Running: in multidoc
calling updateWorkloadContainer(manifest, hello:deployment/helloworld, helloworld, quay.io/weaveworks/helloworld:master-a000001)
Took: 1.0 seconds
Running: in kubernetes List resource
calling updateWorkloadContainer(manifest, hello:deployment/helloworld, helloworld, quay.io/weaveworks/helloworld:master-a000001)
Took: 1.2 seconds
Running: FluxHelmRelease (v1alpha2; simple image encoding)
calling updateWorkloadContainer(manifest, maria:fluxhelmrelease/mariadb, chart-image, bitnami/mariadb:10.1.33)
Took: 1.0 seconds
Running: FluxHelmRelease (v1alpha2; multi image encoding)
calling updateWorkloadContainer(manifest, maria:fluxhelmrelease/mariadb, mariadb, bitnami/mariadb:10.1.33)
Took: 1.0 seconds
Running: HelmRelease (v1beta1; image with port number)
calling updateWorkloadContainer(manifest, maria:helmrelease/mariadb, mariadb, localhost:5000/mariadb:10.1.33)
Took: 1.0 seconds
Running: HelmRelease (v1; with image map)
Took: 1.0 seconds
Running: initContainer
calling updateWorkloadContainer(manifest, default:deployment/weave, weave, weaveworks/weave-kube:2.2.1)
panic: test timed out after 1m0s

I am pretty sure docker gets stuck when executing kubeyaml. Why, I still don't know.

@2opremio
Copy link
Contributor Author

2opremio commented Dec 10, 2019

Here is another failure instance https://circleci.com/gh/fluxcd/flux/9013 . In this case it fails in a different test-case:

Running: common case
calling updateWorkloadContainer(manifest, extra:deployment/pr-assigner, pr-assigner, quay.io/weaveworks/pr-assigner:master-1234567)
Took: 1.2 seconds
Running: new version like number
calling updateWorkloadContainer(manifest, default:deployment/fluxy, fluxy, weaveworks/fluxy:1234567)
Took: 1.1 seconds
Running: old version like number
calling updateWorkloadContainer(manifest, default:deployment/fluxy, fluxy, weaveworks/fluxy:master-a000001)
Took: 1.1 seconds
Running: name label out of order
calling updateWorkloadContainer(manifest, monitoring:deployment/grafana, grafana, quay.io/weaveworks/grafana:master-37aaf67)
Took: 1.0 seconds
Running: version (tag) with dots
calling updateWorkloadContainer(manifest, sock-shop:deployment/front-end, front-end, weaveworksdemos/front-end:7f511af2d21fd601b86b3bed7baa6adfa9c8c669)
Took: 1.1 seconds
Running: minimal dockerhub image name
calling updateWorkloadContainer(manifest, default:deployment/nginx, nginx, nginx:1.10-alpine)
Took: 1.2 seconds
Running: reordered keys
calling updateWorkloadContainer(manifest, default:deployment/nginx, nginx, nginx:1.10-alpine)
calling updateWorkloadContainer(manifest, default:deployment/nginx, nginx2, nginx:1.10-alpine)
Took: 2.0 seconds
Running: from prod
calling updateWorkloadContainer(manifest, default:deployment/authfe, logging, quay.io/weaveworks/logging:master-123456)
Took: 1.0 seconds
Running: single quotes
calling updateWorkloadContainer(manifest, default:deployment/weave, weave, weaveworks/weave-kube:2.2.1)
Took: 1.0 seconds
Running: in multidoc
calling updateWorkloadContainer(manifest, hello:deployment/helloworld, helloworld, quay.io/weaveworks/helloworld:master-a000001)
Took: 0.9 seconds
Running: in kubernetes List resource
calling updateWorkloadContainer(manifest, hello:deployment/helloworld, helloworld, quay.io/weaveworks/helloworld:master-a000001)
Took: 1.1 seconds
Running: FluxHelmRelease (v1alpha2; simple image encoding)
calling updateWorkloadContainer(manifest, maria:fluxhelmrelease/mariadb, chart-image, bitnami/mariadb:10.1.33)
Took: 1.0 seconds
Running: FluxHelmRelease (v1alpha2; multi image encoding)
calling updateWorkloadContainer(manifest, maria:fluxhelmrelease/mariadb, mariadb, bitnami/mariadb:10.1.33)
Took: 1.0 seconds
Running: HelmRelease (v1beta1; image with port number)
calling updateWorkloadContainer(manifest, maria:helmrelease/mariadb, mariadb, localhost:5000/mariadb:10.1.33)
Took: 1.0 seconds
Running: HelmRelease (v1; with image map)
panic: test timed out after 1m0s

@2opremio
Copy link
Contributor Author

2opremio commented Dec 11, 2019

OK. I think I know what's happening (and I feel a bit stupid about not having figured it out before).

TL;DR: we were being too conservative in the -timeout go test flag. And the flag doesn't work like I expected.

We set a timeout of 60s when running the tests. I had assumed that the timeout applied per test (i.e. per t.Run() invokation or Test* function). The documentation says otherwise though:

    -timeout d
        If a test binary runs longer than duration d, panic.
        If d is 0, the timeout is disabled.
        The default is 10 minutes (10m).

It's not completely clear to me what a binary means, but, in the Makefile pass all the flux go packages to go test:

test: test/bin/helm test/bin/kubectl test/bin/kustomize $(GENERATED_TEMPLATES_FILE)
	PATH="${PWD}/bin:${PWD}/test/bin:${PATH}" go test ${TEST_FLAGS} $(shell go list ./... | grep -v "^github.com/fluxcd/flux/vendor" | sort -u)

For instance, this fails

$ cd flux
$ PATH=$PATH:$PWD/bin/ time go test  -v -race -tags integration -timeout 60s github.com/fluxcd/flux/pkg/cluster/kubernetes/
[...]
=== RUN   TestWorkloadContainerUpdates/HelmRelease_(v1;_with_image_map)
panic: test timed out after 1m0s
[...]
FAIL	github.com/fluxcd/flux/pkg/cluster/kubernetes	60.078s
FAIL

But if we filter TestWorkloadContainerUpdates specifically it passes.

For instance, this fails

$ cd flux
$ PATH=$PATH:$PWD/bin/ time go test  -v -run TestWorkloadContainerUpdates -race -tags integration -timeout 60s github.com/fluxcd/flux/pkg/cluster/kubernetes/
[...]
--- PASS: TestWorkloadContainerUpdates (25.41s)
[...]
ok  	github.com/fluxcd/flux/pkg/cluster/kubernetes	(cached)

So, I think this started happening when adding new tests to pkg/cluster/kubernetes/ in 18dd15e .

Also, this didn't surface all that much because of caching (the test was only re-run when the package was modified and if you got lucky, it run fast enough, getting cached again).

Since it's hard to control -timeout granularity I will be generous with it.

Jeez, this took way longer than it should had.

@2opremio
Copy link
Contributor Author

2opremio commented Dec 11, 2019

Also (if the root cause didn't change after #2674 ), this means that parallelization wasn't really doing much because the tests of pkg/cluster/kubernetes take ~62 seconds sequentially.

@2opremio
Copy link
Contributor Author

2opremio commented Dec 11, 2019

And ... here is another instance of e2e tests failing https://circleci.com/gh/fluxcd/flux/9070

In this case, and for some reason, gitsrv cannot be reached on port 22.

@2opremio
Copy link
Contributor Author

2opremio commented Dec 11, 2019

and gitsrv failed to start again (in master): https://circleci.com/gh/fluxcd/flux/9090

@hiddeco
Copy link
Member

hiddeco commented Dec 11, 2019

In this case, and for some reason, gitsrv cannot be reached on port 22.

I have seen CI builds on other platforms where port 22 was occupied by some other (debugging) process or tool. Maybe it is wise to switch to a non-default port to not run in to other (unexpected) processes?

@2opremio
Copy link
Contributor Author

I have seen CI builds on other platforms where port 22 was occupied by some other (debugging) process or tool. Maybe it is wise to switch to a non-default port to not run in to other (unexpected) processes?

As we discussed offline. It's only port 22 in the pod, the port on the client side (kubectl port-forward) is randomized

@2opremio
Copy link
Contributor Author

Here is another e2e test failing https://circleci.com/gh/fluxcd/flux/9093 (from #2684 ). In this case I am not even sure about what's wrong.

@hiddeco
Copy link
Member

hiddeco commented Dec 11, 2019

Here is another e2e test failing circleci.com/gh/fluxcd/flux/9093

The problem here seems to be:

# '[ "$head_hash" = "$sync_tag_hash" ]' failed

Looking at the failing test it seems that our checks are insufficient and there is a (small) possibility that we compare an empty string to the $head_hash.

  # This does not ensure the tag was pushed by Flux,
  # only that the sync was applied.
  poll_until_equals "podinfo image" "stefanprodan/podinfo:3.1.5" "kubectl get pod -n demo -l app=podinfo -o\"jsonpath={['items'][0]['spec']['containers'][0]['image']}\""
  # So it is possible that this does not result in a rev string.
  git pull -f --tags
  sync_tag_hash=$(git rev-list -n 1 flux)
  [ "$head_hash" = "$sync_tag_hash" ]

@2opremio
Copy link
Contributor Author

Good catch. It should be easy to turn that into a poll_until_equals

@2opremio
Copy link
Contributor Author

There is also this unit test failure. Which may be a legitimate problem? https://circleci.com/gh/fluxcd/flux/9093

@hiddeco
Copy link
Member

hiddeco commented Dec 11, 2019

I think you mean the unit test failure in https://circleci.com/gh/fluxcd/flux/9109? My theory is that this is due to some CircleCI nodes having a better connection than others, resulting in a (temporary) rate limit from DockerHub because we are hitting their registry too hard.

I think this can be simply avoided by mocking the registry using registry:2.x and i.e. docker load to dump dummy images in it from tarballs.

@hiddeco hiddeco added the build About the build or test scaffolding label Dec 11, 2019
@2opremio
Copy link
Contributor Author

I think you mean the unit test failure in https://circleci.com/gh/fluxcd/flux/9109?

Sorry, wrong link. I was referring to https://circleci.com/gh/fluxcd/flux/9158

@2opremio
Copy link
Contributor Author

After #2688 it seems like e2e tests are way more stable. I run them ~30 times in a row at https://circleci.com/gh/fluxcd/workflows/flux/tree/reproduce-flakey-tests and they worked except for an occurrence of Kind failing to create a cluster which I think we will need to live with.

We still have the unit test failures from https://circleci.com/gh/fluxcd/flux/9158 , https://circleci.com/gh/fluxcd/flux/9109 , but they seem to seldom occur.

I am going to step away from this issue for a while, but let's keep adding flakey tests here.

@hiddeco
Copy link
Member

hiddeco commented Dec 17, 2019

Another one just happened on master https://circleci.com/gh/fluxcd/flux/9314

@2opremio
Copy link
Contributor Author

2opremio commented Jan 9, 2020

Another failure, due to kind failing to create a cluster https://circleci.com/gh/fluxcd/flux/9444

@2opremio
Copy link
Contributor Author

Another failure https://circleci.com/gh/fluxcd/flux/9462

@hiddeco
Copy link
Member

hiddeco commented Jan 13, 2020

Another failure https://circleci.com/gh/fluxcd/flux/9462

Same failure, different build: https://circleci.com/gh/fluxcd/flux/9465

@2opremio
Copy link
Contributor Author

2opremio commented Jan 27, 2020

another failure from Kind: https://circleci.com/gh/fluxcd/flux/9822

ERROR: failed to create cluster: failed to generate kubeadm config content: failed to get kubernetes version from node: failed to get file: command "docker exec --privileged flux-e2e-2-control-plane cat /kind/version" failed with error: exit status 1

@2opremio
Copy link
Contributor Author

I created an upstream issue for this at kubernetes-sigs/kind#1288

@2opremio
Copy link
Contributor Author

2opremio commented Feb 3, 2020

Flakey execution of the image release e2e test: https://circleci.com/gh/fluxcd/flux/9930

@2opremio
Copy link
Contributor Author

2opremio commented Feb 3, 2020

I actually don't think it was a flakey test since it failed 3 times in a row. There must be a bug in that PR

EDIT: it wasn't

@2opremio
Copy link
Contributor Author

2opremio commented Feb 4, 2020

Flakey policy update unit test: https://circleci.com/gh/fluxcd/flux/9984

@hiddeco
Copy link
Member

hiddeco commented Feb 5, 2020

Policy update unit test, and panic in releaser unit test due to registry timeout: https://circleci.com/gh/fluxcd/flux/9998

@2opremio
Copy link
Contributor Author

2opremio commented Feb 6, 2020

Kind initialization error https://circleci.com/gh/fluxcd/flux/10092

I will report it upstream

@kingdonb
Copy link
Member

I have not seen any flaky tests so far, except for the one I narrowly managed to avoid merging in with the last release, (my first Flux Daemon release, 1.21.2.)

Thank you for documenting this. I will reopen, (or more likely start a new issue, so you are not bothered with it) if it turns out that there are still flaky tests while Flux v1 is in maintenance.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
build About the build or test scaffolding
Projects
None yet
Development

No branches or pull requests

5 participants