Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps][Alerting] Flaky test x-pack/test/alerting_api_integration/security_and_spaces/group2/tests/telemetry/index·ts #161096

Merged
merged 13 commits into from
Jul 31, 2023

Conversation

doakalexi
Copy link
Contributor

@doakalexi doakalexi commented Jul 3, 2023

Resolves #140973
Resolves #136153

Summary

Fixes a telemetry bug and a flaky telemetry test

Checklist

https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2717 x 250
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2718 x 250

@doakalexi doakalexi changed the title Fixing flaky test [ResponseOps][Alerting] Failing test: X-Pack Alerting API Integration Tests.x-pack/test/alerting_api_integration/security_and_spaces/group2/tests/telemetry/index·ts Jul 25, 2023
@doakalexi doakalexi added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.10.0 release_note:skip Skip the PR/issue when compiling release notes labels Jul 25, 2023
@doakalexi doakalexi marked this pull request as ready for review July 26, 2023 11:31
@doakalexi doakalexi requested a review from a team as a code owner July 26, 2023 11:31
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@doakalexi doakalexi changed the title [ResponseOps][Alerting] Failing test: X-Pack Alerting API Integration Tests.x-pack/test/alerting_api_integration/security_and_spaces/group2/tests/telemetry/index·ts [ResponseOps][Alerting] Flaky test x-pack/test/alerting_api_integration/security_and_spaces/group2/tests/telemetry/index·ts Jul 26, 2023
Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. I'd like to get the equal -> gte change in the getEventLog() call - that feels like it could someday go flaky.

The others I'm not so sure about. Mainly because I'm missing some context. Could you provide some comments in the PR for:

  • why the originating issues were flaky
  • why they aren't flaky anymore
  • it seems like we removed some rules being tested - I'm curious why

@@ -299,12 +262,10 @@ export default function createAlertingAndActionsTelemetryTests({ getService }: F

// number of rule executions - just checking for non-zero as we can't set an exact number
// each rule should have had a chance to execute once
expect(telemetry.count_rules_executions_per_day >= 21).to.be(true);
expect(telemetry.count_rules_executions_per_day >= 0).to.be(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate to have to leave that at potentially zero! Looking at the code, I think we'd have to call getEventLog() on every rule, to make sure it has at least one execute, before requesting the telemetry. Then we could use the number of rules instead of zero.

Copy link
Contributor Author

@doakalexi doakalexi Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into this too and see if I can use the number of rules!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in this commit a273697

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to non 0 caused the test to be flaky again, I'd like to just leave leave it as > 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, no problem!

@doakalexi
Copy link
Contributor Author

doakalexi commented Jul 27, 2023

  • why the originating issues were flaky

The original test was flaky from this block of code for both actions and alerting telemetry

await retry.try(async () => {
        const telemetryTask = await es.get<TaskManagerDoc>({
          id: `task:Actions-actions_telemetry`,
          index: '.kibana_task_manager',
        });
        expect(telemetryTask!._source!.task?.status).to.be('idle');
        const taskState = telemetryTask!._source!.task?.state;
        expect(taskState).not.to.be(undefined);
        actionsTelemetry = JSON.parse(taskState!);
        expect(actionsTelemetry.runs).to.equal(2);
        expect(actionsTelemetry.count_total).to.equal(20);
      });

The number of actionsTelemetry.runs was not always 2, and I believe looking for an exact number of runs was causing the expect(telemetryTask!._source!.task?.status).to.be('idle') to also be flaky where sometimes it would be 'running' instead of 'idle'. This is why I removed looking for an exact count of 2 runs and only wanted to verify that there had been at least one run.

  • why they aren't flaky anymore

I simplified the test, added the object remover, and updated the code block above to remove the flaky parts I mentioned.

  • it seems like we removed some rules being tested - I'm curious why

I removed some rules to help clean up the test, I thought it was a hard to understand and some of the rules seemed to be verifying redundant things. I only kept the minimal amount of rules I needed to verify the telemetry results.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for changes (and attempts at changes!)

@doakalexi doakalexi merged commit d5761dc into elastic:main Jul 31, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jul 31, 2023
ThomThomson pushed a commit to ThomThomson/kibana that referenced this pull request Aug 1, 2023
…on/security_and_spaces/group2/tests/telemetry/index·ts (elastic#161096)

Resolves elastic#140973
Resolves elastic#136153

## Summary

Fixes a telemetry bug and a flaky telemetry test

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios


https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2717
x 250

https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/2718
x 250
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment