"Metrics are generated for OLM managed resources Given an OperatorGroup that supports all namespaces when a CSV is created when the OLM pod restarts CSV metric is preserved" #2441

timflannagan · 2021-11-09T17:43:11Z

Metrics are generated for OLM managed resources
/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/metrics_e2e_test.go:32
  Given an OperatorGroup that supports all namespaces
  /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/metrics_e2e_test.go:45
    when a CSV is created
    /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/metrics_e2e_test.go:121
      when the OLM pod restarts
      /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/metrics_e2e_test.go:147
        CSV metric is preserved [It]
        /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/metrics_e2e_test.go:151

        Expected
            <[]e2e.Metric | len:2, cap:2>: [
                {
                    Family: "csv_upgrade_count",
                    Labels: nil,
                    Value: 0,
                },
                {Family: "csv_count", Labels: nil, Value: 0},
            ]
        to contain element matching
            <*e2e.LikeMetricMatcher | 0xc002120420>: {
                Predicates: [
                    {
                        f: 0x2be9f80,
                        name: "WithFamily(csv_succeeded)",
                    },
                    {
                        f: 0x2bea000,
                        name: "WithLabel(name=csv-test-g48g2-stable)",
                    },
                    {f: 0x2bea240, name: "WithValue(1)"},
                ],
            }

        /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/metrics_e2e_test.go:152

        Full Stack Trace
        github.com/operator-framework/operator-lifecycle-manager/test/e2e.glob..func12.2.3.4.2()
        	/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/metrics_e2e_test.go:152 +0x32b
        github.com/operator-framework/operator-lifecycle-manager/test/e2e.TestEndToEnd(0xc000381380)
        	/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/e2e_test.go:59 +0x296
        testing.tRunner(0xc000381380, 0x343c030)
        	/opt/hostedtoolcache/go/1.16.10/x64/src/testing/testing.go:1193 +0xef
        created by testing.(*T).Run
        	/opt/hostedtoolcache/go/1.16.10/x64/src/testing/testing.go:1238 +0x2b3

The text was updated successfully, but these errors were encountered:

timflannagan · 2021-11-09T17:43:29Z

Link to an action run that contains this failed test: https://github.com/operator-framework/operator-lifecycle-manager/runs/4154202639?check_suite_focus=true.

timflannagan · 2021-11-09T17:46:41Z

Ran into this error in #2438.

timflannagan · 2021-11-09T17:48:12Z

Note: last time we had debugged this failure, it appears to be a problem with the actual test assertions being made vs. functionality that isn't working correctly in OLM. We may need to update the test logic for checking when a deployment is considered "ready" after recycling the catalog-operator to test this behavior as there may be inadequate checks for readiness, leading to the test code checking for metric presence before the new catalog-operator pod has established a metrics endpoint, leading to false positives.

timflannagan · 2021-11-09T17:51:58Z

It looks like there's already an existing issue for this that didn't come up when I had search for it (and I didn't link it properly in the overall CI tracking issue): #2390 (comment). Going to close out the pre-existing one, as this one now has more context available and action runs associated with the error.

timflannagan · 2021-11-09T17:54:10Z

Copy-pasting that issue's description to consolidate issues:

#2216 had fixed a bug where the csv_succeeded was lost between deployment pod restarts. In those changes, a new metric e2e test was created that restarts (e.g. scales down/scales back up) to test whether the metric was retained after pod restarts:

			When("the OLM pod restarts", func() {
				BeforeEach(func() {
					restartDeploymentWithLabel(c, "app=olm-operator")
				})
				It("CSV metric is preserved", func() {
					Expect(getMetricsFromPod(c, getPodWithLabel(c, "app=olm-operator"))).To(
						ContainElement(LikeMetric(WithFamily("csv_succeeded"), WithName(csv.Name), WithValue(1))),
					)
				})
			})

It looks like the restartDeploymentWithLabel(...) doesn't have enough safeguards to verify that the restarted deployment is ready and available, leading to issues running this e2e test on more bloated clusters, as we're attempting to grab the metric before the metric endpoint has been setup and ready to serve traffic.

timflannagan · 2021-12-17T18:03:30Z

It looks like I can't assign @akihikokuroda to this ticket for whatever reason - #2530 is the PR attempting to fix this e2e flake.

timflannagan mentioned this issue Nov 9, 2021

OLM CI Tracking #2401

Open

43 tasks

timflannagan added the kind/bug Categorizes issue or PR as related to a bug. label Nov 9, 2021

timflannagan added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. and removed kind/bug Categorizes issue or PR as related to a bug. labels Nov 9, 2021

njhale self-assigned this Nov 11, 2021

njhale added the triaged Issue has been considered by a member of the OLM community label Nov 11, 2021

timflannagan assigned timflannagan and unassigned njhale and timflannagan Dec 17, 2021

timflannagan linked a pull request Dec 17, 2021 that will close this issue

fix e2e CSV metric is preserved failure #2530

Merged

5 tasks

openshift-merge-robot closed this as completed in #2530 Dec 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Metrics are generated for OLM managed resources Given an OperatorGroup that supports all namespaces when a CSV is created when the OLM pod restarts CSV metric is preserved" #2441

"Metrics are generated for OLM managed resources Given an OperatorGroup that supports all namespaces when a CSV is created when the OLM pod restarts CSV metric is preserved" #2441

timflannagan commented Nov 9, 2021 •

edited

Loading

timflannagan commented Nov 9, 2021

timflannagan commented Nov 9, 2021 •

edited

Loading

timflannagan commented Nov 9, 2021

timflannagan commented Nov 9, 2021

timflannagan commented Nov 9, 2021

timflannagan commented Dec 17, 2021

"Metrics are generated for OLM managed resources Given an OperatorGroup that supports all namespaces when a CSV is created when the OLM pod restarts CSV metric is preserved" #2441

"Metrics are generated for OLM managed resources Given an OperatorGroup that supports all namespaces when a CSV is created when the OLM pod restarts CSV metric is preserved" #2441

Comments

timflannagan commented Nov 9, 2021 • edited Loading

timflannagan commented Nov 9, 2021

timflannagan commented Nov 9, 2021 • edited Loading

timflannagan commented Nov 9, 2021

timflannagan commented Nov 9, 2021

timflannagan commented Nov 9, 2021

timflannagan commented Dec 17, 2021

timflannagan commented Nov 9, 2021 •

edited

Loading

timflannagan commented Nov 9, 2021 •

edited

Loading