Skip to content

Commit

Permalink
(Partially) mitigate flaky tests
Browse files Browse the repository at this point in the history
Signed-off-by: perdasilva <perdasilva@redhat.com>

Signed-off-by: perdasilva <perdasilva@redhat.com>
  • Loading branch information
perdasilva committed Feb 16, 2022
1 parent 6f59a3b commit 80e95f1
Show file tree
Hide file tree
Showing 17 changed files with 134 additions and 34 deletions.
23 changes: 23 additions & 0 deletions .github/ISSUE_TEMPLATE/flaky-test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
name: Flaky Test Report
about: If an unrelated test is failing for mysterious reasons
title: ''
labels: 'kind/flake'
assignees: ''

---

## Flaky Test Report

<!--
Note: Make sure to first check the prerequisites that can be found in the main README file!
Thanks for filing an issue! Before hitting the button, please provide the following information:
-->

**Failure Log Link**
[Failure Log](put link here)
You can get the link by clicking on the line number in the job logs and copying the url from your browser.

**Relevant Failure Log**
<!--- Paste any relevant failure logs from the job -->

2 changes: 2 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ Note: Make sure your branch is rebased to the latest upstream master.
- [ ] Sufficient end-to-end test coverage
- [ ] Docs updated or added to `/doc`
- [ ] Commit messages sensible and descriptive
- [ ] Tests marked as `[FLAKE]` are truly flaky
- [ ] Tests that remove the `[FLAKE]` tag are no longer flaky


<!--
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/e2e-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- uses: actions/setup-go@v2
with:
go-version: '~1.17'
- run: make e2e-local E2E_TEST_CHUNK=${{ matrix.parallel-id }} E2E_TEST_NUM_CHUNKS=${{ strategy.job-total }} E2E_NODES=2 ARTIFACTS_DIR=./artifacts/
- run: make e2e-local E2E_TEST_CHUNK=${{ matrix.parallel-id }} E2E_TEST_NUM_CHUNKS=${{ strategy.job-total }} E2E_NODES=2 ARTIFACTS_DIR=./artifacts/ SKIP='\[FLAKY\]'
- name: Archive Test Artifacts # test results, failed or not, are always uploaded.
if: ${{ always() }}
uses: actions/upload-artifact@v2
Expand Down
24 changes: 24 additions & 0 deletions .github/workflows/flaky-e2e.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: flaky-e2e-tests
on:
schedule:
- cron: '30 5,17 * * *' # run this every day at 5:30 and 17:30 UTC (00:30 and 12:30 ET)
push:
branches:
- master
pull_request:
workflow_dispatch:
jobs:
flaky-e2e-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- uses: actions/setup-go@v2
with:
go-version: '~1.17'
- run: make e2e-local E2E_NODES=1 TEST='\[FLAKY\]' ARTIFACTS_DIR=./artifacts/
- name: Archive Test Artifacts # test results, failed or not, are always uploaded.
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: e2e-test-output-${{(github.event.pull_request.head.sha||github.sha)}}-${{ github.run_id }}
path: ${{ github.workspace }}/bin/artifacts/*
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ E2E_TEST_NUM_CHUNKS ?= 4
ifneq (all,$(E2E_TEST_CHUNK))
TEST := $(shell go run ./test/e2e/split/... -chunks $(E2E_TEST_NUM_CHUNKS) -print-chunk $(E2E_TEST_CHUNK) ./test/e2e)
endif
E2E_OPTS ?= $(if $(E2E_SEED),-seed '$(E2E_SEED)') $(if $(TEST),-focus '$(TEST)') -flakeAttempts $(E2E_FLAKE_ATTEMPTS) -nodes $(E2E_NODES) -timeout $(E2E_TIMEOUT) -v -randomizeSuites -race -trace -progress
E2E_OPTS ?= $(if $(E2E_SEED),-seed '$(E2E_SEED)') $(if $(SKIP), -skip '$(SKIP)') $(if $(TEST),-focus '$(TEST)') -flakeAttempts $(E2E_FLAKE_ATTEMPTS) -nodes $(E2E_NODES) -timeout $(E2E_TIMEOUT) -v -randomizeSuites -race -trace -progress
E2E_INSTALL_NS ?= operator-lifecycle-manager
E2E_TEST_NS ?= operators

Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,10 @@ Check out the [contributor documentation][contributor-documentation]. Also, see

See [reporting bugs][bug_guide] for details about reporting any issues.

## Reporting flaky tests

See [reporting flaky tests][flaky_test_guide] for details about reporting flaky tests.

## License

Operator Lifecycle Manager is under Apache 2.0 license. See the [LICENSE][license_file] file for details.
Expand All @@ -179,6 +183,7 @@ Operator Lifecycle Manager is under Apache 2.0 license. See the [LICENSE][licens
[proposals_docs]: ./doc/contributors/design-proposals
[license_file]:./LICENSE
[bug_guide]:./doc/dev/reporting_bugs.md
[flaky_test_guide]:./doc/dev/reporting_flakes.md
[git_tool]:https://git-scm.com/downloads
[go_tool]:https://golang.org/dl/
[docker_tool]:https://docs.docker.com/install/
Expand Down
21 changes: 21 additions & 0 deletions doc/dev/reporting_flakes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Reporting flakes

If you are struggling to get your PR through because unrelated e2e or unit tests are randomly failing, it's likely
you are being plagued by a flaky test 😱, a test that wasn't constructed as carefully as it should have been as is
failing even when it should be succeeding. When this happens, check our [issues](https://github.com/operator-framework/operator-lifecycle-manager/issues)
to see if it has been filed before. Search also in the `closed issues`. If you find one, re-open it if necessary.
Otherwise, [file](https://github.com/operator-framework/operator-lifecycle-manager/issues/new) a flaky test issue.

Once you have an issue link, you can disable the flaky test by adding the `[FLAKE]` tag to the test name and linking the issue in the code.

Example:

```
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2635
It("[FLAKE] updates multiple intermediates", func() {
...
```

You may be asked by the reviewer to supply evidence that the test is indeed flaky and not an unfortunate side effect of
your contribution. We'll endeavor to make this an easy as process as possible to merge your contribution in as quickly
and safely as possible.
6 changes: 4 additions & 2 deletions test/e2e/catalog_e2e_test.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
// +build !bare
//go:build !bare
// +build !bare

package e2e

Expand Down Expand Up @@ -870,7 +871,8 @@ var _ = Describe("Catalog represents a store of bundles which OLM can use to ins
Expect(v).Should(Equal(version.OperatorVersion{Version: busyboxVersion}), "latest version of operator not installed: catalog source update failed")
})

It("Dependency has correct replaces field", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2642
It("[FLAKE] Dependency has correct replaces field", func() {
// Create a CatalogSource that contains the busybox v1 and busybox-dependency v1 images
// Create a Subscription for busybox v1, which has a dependency on busybox-dependency v1.
// Wait for the busybox and busybox2 Subscriptions to succeed
Expand Down
6 changes: 4 additions & 2 deletions test/e2e/crd_e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ import (
var _ = Describe("CRD Versions", func() {
AfterEach(func() { TearDown(testNamespace) }, float64(30))

It("creates v1 CRDs with a v1 schema successfully", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2640
It("[FLAKE] creates v1 CRDs with a v1 schema successfully", func() {
By("v1 crds with a valid openapiv3 schema should be created successfully by OLM")
c := newKubeClient()
crc := newCRClient()
Expand Down Expand Up @@ -94,7 +95,8 @@ var _ = Describe("CRD Versions", func() {
Expect(fetchedInstallPlan.Status.Phase).To(Equal(operatorsv1alpha1.InstallPlanPhaseComplete))
})

It("blocks a CRD upgrade that could cause data loss", func() {
// issue:https://github.com/operator-framework/operator-lifecycle-manager/issues/2638
It("[FLAKE] blocks a CRD upgrade that could cause data loss", func() {
By("checking the storage versions in the existing CRD status and the spec of the new CRD")

c := newKubeClient()
Expand Down
30 changes: 18 additions & 12 deletions test/e2e/csv_e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ import (
"k8s.io/apimachinery/pkg/api/equality"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/meta"
"k8s.io/apimachinery/pkg/fields"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/fields"
k8slabels "k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/selection"
Expand Down Expand Up @@ -64,10 +64,10 @@ var _ = Describe("ClusterServiceVersion", func() {

When("a CustomResourceDefinition was installed alongside a ClusterServiceVersion", func() {
var (
ns corev1.Namespace
crd apiextensionsv1.CustomResourceDefinition
og operatorsv1.OperatorGroup
apiname string
ns corev1.Namespace
crd apiextensionsv1.CustomResourceDefinition
og operatorsv1.OperatorGroup
apiname string
apifullname string
)

Expand Down Expand Up @@ -144,7 +144,8 @@ var _ = Describe("ClusterServiceVersion", func() {
}).Should(WithTransform(k8serrors.IsNotFound, BeTrue()))
})

It("can satisfy an associated ClusterServiceVersion's ownership requirement", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2646
It("[FLAKE] can satisfy an associated ClusterServiceVersion's ownership requirement", func() {
associated := operatorsv1alpha1.ClusterServiceVersion{
ObjectMeta: metav1.ObjectMeta{
Name: "associated-csv",
Expand Down Expand Up @@ -261,7 +262,8 @@ var _ = Describe("ClusterServiceVersion", func() {
}).Should(Succeed())
})

It("can satisfy an unassociated ClusterServiceVersion's non-ownership requirement", func() {
// issue:https://github.com/operator-framework/operator-lifecycle-manager/issues/2639
It("[FLAKE] can satisfy an unassociated ClusterServiceVersion's non-ownership requirement", func() {
unassociated := operatorsv1alpha1.ClusterServiceVersion{
ObjectMeta: metav1.ObjectMeta{
Name: "unassociated-csv",
Expand Down Expand Up @@ -4206,7 +4208,7 @@ var _ = Describe("ClusterServiceVersion", func() {

var _ = Describe("Disabling copied CSVs", func() {
var (
ns corev1.Namespace
ns corev1.Namespace
csv operatorsv1alpha1.ClusterServiceVersion
)

Expand Down Expand Up @@ -4255,7 +4257,8 @@ var _ = Describe("Disabling copied CSVs", func() {
})

When("an operator is installed in AllNamespace mode", func() {
It("should have Copied CSVs in all other namespaces", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2643
It("[FLAKE] should have Copied CSVs in all other namespaces", func() {
Eventually(func() error {
requirement, err := k8slabels.NewRequirement(operatorsv1alpha1.CopiedLabelKey, selection.Equals, []string{csv.GetNamespace()})
if err != nil {
Expand Down Expand Up @@ -4335,7 +4338,8 @@ var _ = Describe("Disabling copied CSVs", func() {
}).Should(Succeed())
})

It("should be reflected in the olmConfig.Status.Condition array that the expected number of copied CSVs exist", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2634
It("[FLAKE] should be reflected in the olmConfig.Status.Condition array that the expected number of copied CSVs exist", func() {
Eventually(func() error {
var olmConfig operatorsv1.OLMConfig
if err := ctx.Ctx().Client().Get(context.TODO(), apitypes.NamespacedName{Name: "cluster"}, &olmConfig); err != nil {
Expand Down Expand Up @@ -4391,7 +4395,8 @@ var _ = Describe("Disabling copied CSVs", func() {
}).Should(Succeed())
})

It("should have copied CSVs in all other Namespaces", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2634
It("[FLAKE] should have copied CSVs in all other Namespaces", func() {
Eventually(func() error {
// find copied csvs...
requirement, err := k8slabels.NewRequirement(operatorsv1alpha1.CopiedLabelKey, selection.Equals, []string{csv.GetNamespace()})
Expand Down Expand Up @@ -4420,7 +4425,8 @@ var _ = Describe("Disabling copied CSVs", func() {
}).Should(Succeed())
})

It("should be reflected in the olmConfig.Status.Condition array that the expected number of copied CSVs exist", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2641
It("[FLAKE] should be reflected in the olmConfig.Status.Condition array that the expected number of copied CSVs exist", func() {
Eventually(func() error {
var olmConfig operatorsv1.OLMConfig
if err := ctx.Ctx().Client().Get(context.TODO(), apitypes.NamespacedName{Name: "cluster"}, &olmConfig); err != nil {
Expand Down
3 changes: 2 additions & 1 deletion test/e2e/gc_e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -646,7 +646,8 @@ var _ = Describe("Garbage collection for dependent resources", func() {
}).Should(BeNil())
})

It("should have removed the old configmap and put the new configmap in place", func() {
// flake issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2626
It("[FLAKE] should have removed the old configmap and put the new configmap in place", func() {
Eventually(func() bool {
_, err := kubeClient.GetConfigMap(testNamespace, configmapName)
return k8serrors.IsNotFound(err)
Expand Down
10 changes: 6 additions & 4 deletions test/e2e/installplan_e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -834,7 +834,7 @@ var _ = Describe("Install Plan", func() {
}()

// Create the catalog source
mainCatalogSourceName := genName("mock-ocs-main-" + strings.ToLower(CurrentGinkgoTestDescription().TestText) + "-")
mainCatalogSourceName := genName("mock-ocs-main-" + strings.ToLower(K8sSafeCurrentTestDescription()) + "-")
_, cleanupCatalogSource := createInternalCatalogSource(c, crc, mainCatalogSourceName, testNamespace, mainManifests, []apiextensions.CustomResourceDefinition{dependentCRD, mainCRD}, []operatorsv1alpha1.ClusterServiceVersion{dependentBetaCSV, dependentStableCSV, mainStableCSV, mainBetaCSV})
defer cleanupCatalogSource()

Expand Down Expand Up @@ -3006,7 +3006,8 @@ var _ = Describe("Install Plan", func() {

// This It spec verifies that, in cases where there are multiple options to fulfil a dependency
// across multiple catalogs, we only generate one installplan with one set of resolved resources.
It("consistent generation", func() {
//issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2633
It("[FLAKE] consistent generation", func() {

// Configure catalogs:
// - one catalog with a package that has a dependency
Expand Down Expand Up @@ -3190,15 +3191,16 @@ var _ = Describe("Install Plan", func() {
Expect(err).NotTo(HaveOccurred())
})

It("should clear clear up the condition in the InstallPlan status that contains an error message when a valid OperatorGroup is created", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2636
It("[FLAKE] should clear up the condition in the InstallPlan status that contains an error message when a valid OperatorGroup is created", func() {

// first wait for a condition with a message exists
cond := operatorsv1alpha1.InstallPlanCondition{Type: operatorsv1alpha1.InstallPlanInstalled, Status: corev1.ConditionFalse, Reason: operatorsv1alpha1.InstallPlanReasonInstallCheckFailed,
Message: "no operator group found that is managing this namespace"}

Eventually(func() bool {
fetchedInstallPlan, err := fetchInstallPlanWithNamespace(GinkgoT(), crc, installPlanName, ns.GetName(), buildInstallPlanPhaseCheckFunc(operatorsv1alpha1.InstallPlanPhaseInstalling))
if err != nil || fetchedInstallPlan == nil{
if err != nil || fetchedInstallPlan == nil {
return false
}
if fetchedInstallPlan.Status.Phase != operatorsv1alpha1.InstallPlanPhaseInstalling {
Expand Down
14 changes: 8 additions & 6 deletions test/e2e/operator_groups_e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1419,7 +1419,7 @@ var _ = Describe("Operator Group", func() {
aCSV := newCSV(csvName, opGroupNamespace, "", semver.MustParse("0.0.0"), []apiextensions.CustomResourceDefinition{mainCRD}, nil, &namedStrategy)

// Use the It spec name as label after stripping whitespaces
aCSV.Labels = map[string]string{"label": strings.Replace(CurrentGinkgoTestDescription().TestText, " ", "", -1)}
aCSV.Labels = map[string]string{"label": K8sSafeCurrentTestDescription()}
createdCSV, err := crc.OperatorsV1alpha1().ClusterServiceVersions(opGroupNamespace).Create(context.TODO(), &aCSV, metav1.CreateOptions{})
require.NoError(GinkgoT(), err)

Expand Down Expand Up @@ -1498,7 +1498,7 @@ var _ = Describe("Operator Group", func() {
})
require.NoError(GinkgoT(), err)

csvList, err := crc.OperatorsV1alpha1().ClusterServiceVersions(corev1.NamespaceAll).List(context.TODO(), metav1.ListOptions{LabelSelector: fmt.Sprintf("label=%s", strings.Replace(CurrentGinkgoTestDescription().TestText, " ", "", -1))})
csvList, err := crc.OperatorsV1alpha1().ClusterServiceVersions(corev1.NamespaceAll).List(context.TODO(), metav1.ListOptions{LabelSelector: fmt.Sprintf("label=%s", K8sSafeCurrentTestDescription())})
require.NoError(GinkgoT(), err)
GinkgoT().Logf("Found CSV count of %v", len(csvList.Items))
GinkgoT().Logf("Create other namespace %s", otherNamespaceName)
Expand Down Expand Up @@ -1795,7 +1795,8 @@ var _ = Describe("Operator Group", func() {

// Versions of OLM at 0.14.1 and older had a bug that would place the wrong namespace annotation on copied CSVs,
// preventing them from being GCd. This ensures that any leftover CSVs in that state are properly cleared up.
It("cleanup csvs with bad owner operator groups", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2644
It("[FLAKE] cleanup csvs with bad owner operator groups", func() {

c := newKubeClient()
crc := newCRClient()
Expand Down Expand Up @@ -1899,7 +1900,7 @@ var _ = Describe("Operator Group", func() {
aCSV := newCSV(csvName, opGroupNamespace, "", semver.MustParse("0.0.0"), []apiextensions.CustomResourceDefinition{mainCRD}, nil, &namedStrategy)

// Use the It spec name as label after stripping whitespaces
aCSV.Labels = map[string]string{"label": strings.Replace(CurrentGinkgoTestDescription().TestText, " ", "", -1)}
aCSV.Labels = map[string]string{"label": K8sSafeCurrentTestDescription()}
createdCSV, err := crc.OperatorsV1alpha1().ClusterServiceVersions(opGroupNamespace).Create(context.TODO(), &aCSV, metav1.CreateOptions{})
require.NoError(GinkgoT(), err)

Expand Down Expand Up @@ -1978,7 +1979,7 @@ var _ = Describe("Operator Group", func() {
})
require.NoError(GinkgoT(), err)

csvList, err := crc.OperatorsV1alpha1().ClusterServiceVersions(corev1.NamespaceAll).List(context.TODO(), metav1.ListOptions{LabelSelector: fmt.Sprintf("label=%s", strings.Replace(CurrentGinkgoTestDescription().TestText, " ", "", -1))})
csvList, err := crc.OperatorsV1alpha1().ClusterServiceVersions(corev1.NamespaceAll).List(context.TODO(), metav1.ListOptions{LabelSelector: fmt.Sprintf("label=%s", K8sSafeCurrentTestDescription())})
require.NoError(GinkgoT(), err)
GinkgoT().Logf("Found CSV count of %v", len(csvList.Items))
GinkgoT().Logf("Create other namespace %s", otherNamespaceName)
Expand Down Expand Up @@ -2288,7 +2289,8 @@ var _ = Describe("Operator Group", func() {
Expect(err).ToNot(HaveOccurred())
})

It("OLM applies labels to Namespaces that are associated with an OperatorGroup", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2637
It("[FLAKE] OLM applies labels to Namespaces that are associated with an OperatorGroup", func() {
ogLabel, err := getOGLabelKey(operatorGroup)
Expect(err).ToNot(HaveOccurred())

Expand Down
3 changes: 2 additions & 1 deletion test/e2e/operator_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ var _ = Describe("Operator API", func() {
// 14. Ensure the reference to ns-a is eventually removed from o's status.components.refs field
// 15. Delete o
// 16. Ensure o is not re-created
It("should surface components in its status", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2628
It("[FLAKE] should surface components in its status", func() {
o := &operatorsv1.Operator{}
o.SetName(genName("o-"))

Expand Down
6 changes: 4 additions & 2 deletions test/e2e/subscription_e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,8 @@ var _ = Describe("Subscription", func() {
require.Len(GinkgoT(), ips.Items, 2)
})

It("updates multiple intermediates", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2635
It("[FLAKE] updates multiple intermediates", func() {

crd := newCRD("ins")

Expand Down Expand Up @@ -1022,7 +1023,8 @@ var _ = Describe("Subscription", func() {
// - Delete the referenced InstallPlan
// - Wait for sub to have status condition SubscriptionInstallPlanMissing true
// - Ensure original non-InstallPlan status conditions remain after InstallPlan transitions
It("can reconcile InstallPlan status", func() {
// issue: https://github.com/operator-framework/operator-lifecycle-manager/issues/2645
It("[FLAKE] can reconcile InstallPlan status", func() {
c := newKubeClient()
crc := newCRClient()

Expand Down
Loading

0 comments on commit 80e95f1

Please sign in to comment.