[Alerting] Performance testing tool for alerting need to have the ability to create/clean `ecctl` Kibana deployment. #121457

YulNaumenko · 2021-12-16T21:09:32Z

Describe the feature:
Based on the requirements for kbn-alert-load performance testing tool the next configuration/deployment abilities is needed:

Create a special test account set up at https://cloud.elastic.co?
Create an API key at the cloud site for use with ecctl
Deploy Kibana to ecctl - https://www.elastic.co/guide/en/ecctl/current/ecctl-installing.html when the testing job will be run
This installation should have the config for ecctl with ecctl init, with the proper API key

Describe a specific use case for the feature:
We are planning to make this tool as a part of x-pack/tests/performance/alerting. The initial idea was to use Buildkite to run that performance testing job every night.
In addition we want to setup a slack channel to post failures of this job (ideally once issue #1 is done, the only failures would be regression test failures when the performance of a build fails to meet the specified metric). Ensure that all deployments are cleaned up at the end of the job, regardless of whether the job succeeds or fails.

cc: @ymao1 and @pmuellr

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-12-16T21:09:33Z

Pinging @elastic/kibana-operations (Team:Operations)

elasticmachine · 2021-12-16T21:09:34Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

ymao1 · 2021-12-16T21:15:26Z

Thanks for opening the issue @YulNaumenko! The requirements for this may change as we evaluate whether we can leverage the QA environment in ML for automated performance testing. I will update this issue as we learn more but for right now, no immediate action is needed.

tylersmalley · 2021-12-16T21:22:47Z

Thanks for opening the issue. A few questions:

Is there a timeline you're looking for to have this up and running?
Is the expectation that it uses a snapshot of Kibana in the Cloud-First Testing region?

pmuellr · 2021-12-16T22:17:01Z

So many QA environments to (potentially) choose from! :-) :elastic-heart:

Some thoughts on this, since I haven't given it a whole lot of thought before.

If we want to use kbn-alert-load, or something similar to it, then what we'll actually want is just a plain old node runtime env. We could even pull the source for whatever we're going to run, do the yarn install there, and then run the node app. That node app would use ecctl (or make the equivalent http calls to the cloud api endpoint) to deploy the stacks for testing.

In order to double-check that we don't leave orphan deployments (if the node app dies), I think we'd want a job that just checks for old deployments and complains however it complains that we can see it, so we can delete them manually for now. That job runs maybe once an hour.

We would certainly be wanting to test against a snapshot of some kind - I've only tested BC versions in the Belgium gcp region in recent history - and never had good luck testing snapshots ever. But maybe that was just bad timing. And we'll also be testing against older versions as well.

If we can work with snapshots, what's the frequency of those?

pmuellr · 2021-12-16T22:20:53Z

If we want to use kbn-alert-load, or something similar to it, then what we'll actually want is just a plain old node runtime env. We could even pull the source for whatever we're going to run, do the yarn install there, and then run the node app. That node app would use ecctl (or make the equivalent http calls to the cloud api endpoint) to deploy the stacks for testing.

Since we're talking about where to move kbn-alert-load to (right now it's here: https://github.com/pmuellr/kbn-alert-load), I happened to think that it may be more cumbersome to have this code in the Kibana repo, compared to having it in it's own small repo. Especially for purposes like this, where we'd have to have a Kibana build or check Kibana out of git, just to run the tool :-)

tylersmalley · 2021-12-17T00:57:08Z

Thanks for the additional information.

Currently, @jbudz is working on rolling out manual Cloud-First Testing to PR's, which I believe we could leverage pieces of for this. With that, using the label ci:deploy-cloud will build the Cloud docker image and deploy it to the gcp-us-west2 region. Subsequent pushes to the branch will re-build and update the existing deployment. The deployment will be deleted at a specified time of no updates (~7-14 days), or when the PR is merged/closed.

This is what I am thinking for the daily pipeline based on my understanding of your needs:

Ensure there is no cluster named alert-load-testing, if so, delete it.
Build HEAD of main and deploy to gcp-us-west2 with name alert-load-testing
Download/checkout kbn-alert-load tool and run against the Cloud cluster. (location of tool subject to change)
Post any failures to a Slack channel (yet to be determined)

ymao1 · 2021-12-17T12:53:21Z

@tylersmalley That sounds great and lines up with what we were hoping to have as well! We would want to run different scenarios with different configurations of rules and possibly different kibana configurations. Do you think it would make more sense to have a separate daily pipeline for each scenario (that deploys a separate cluster for each scenario) or try to serialize the scenarios into one pipeline?

tylersmalley · 2021-12-17T20:19:25Z

@ymao1 do you have an idea for how long the tests would run for each configuration? If it's not much, I would say try to serialize them as there is overhead to brining up the cluster (~10 minutes). But if it's maybe more than 5-10 minutes I think creating a separate pipeline would make sense. It's really no difficult to do in Buildkite. The discussion is mostly around resources, and the overall job time.

tylersmalley · 2022-03-22T17:33:42Z

With Cloud-First Testing being live now. Where do you stand with your need for this? Do you need assistance with creating a Buildkite pipeline?

pmuellr · 2022-03-22T18:01:09Z

@EricDavisX - this is a slightly old issue we opened to look at automating some kbn-alert-load runs. I'm thinking the requirements outlined in the top comment are being / will be handled by the work you're currently doing? So we can probably close this?

EricDavisX · 2022-03-23T20:49:10Z

Hello - Yes, I think we can close this. I can update where we are tracking pending work to support Rules oriented performance tests and where the jobs are now:

Nikita has a pr where he is citing BuildKite support for the kbn-alert-load tool which has moved to an elastic space.
https://github.com/elastic/kibana/pull/126058/files
I can cite we were unsure which would be quickest to develop and easiest to integrate with ml-qa things, so we did a proof of concept and have decided to continue iterating on it from the machine-learning-qa-infra jenkins server:
https://ci.ml-qa.com/job/stack-testing/job/kibana/job/performance/job/response-ops-performance-test/
this job runs nightly and runs the 4-cluster 'versions-1000' suite that runs a cluster with 1000 rules every 5 second-run interval with other standard kibana defaults against 'main' branch and next-latest main branch and 7.17.x-snapshot latest and 7.16.3 (as a stable reference).

we've enhanced the jenkins run to always delete the ecctl deploys, so note the kbn-alert-load tool still suffers that ailment

other enhancements are on the way and are being tracked here #119845

there is an internal 'qa' issue if anyone wants i can dm that.

pmuellr · 2022-03-23T21:06:25Z

we've enhanced the jenkins run to always delete the ecctl deploys, so note the kbn-alert-load tool still suffers that ailment

Probably not a bad idea to bake this into kbn-alert-load at this point. I think we'd want a signal handler to catch ctrl-c, etc terminations, which would then delete the deployments on an "unclean" exit as well (or at least most "unclean" exits).

I think the reason I had this in there was so I could debug the deployments if there were some issues, while I was developing this.

We could think of adding this as an option (--keep-deployments-on-error, or such), to support cases like that in the future, but probably YAGNI.

EricDavisX · 2022-03-29T15:25:00Z

it's a good idea- I put a ticket in to the kbn-alert-load project for it Patrick. thanks

YulNaumenko added Team:Operations Team label for Operations Team Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Dec 16, 2021

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Feb 16, 2022

tylersmalley removed loe:small Small Level of Effort impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Mar 16, 2022

exalate-issue-sync bot added the impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. label Mar 22, 2022

EricDavisX closed this as completed Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alerting] Performance testing tool for alerting need to have the ability to create/clean `ecctl` Kibana deployment. #121457

[Alerting] Performance testing tool for alerting need to have the ability to create/clean `ecctl` Kibana deployment. #121457

YulNaumenko commented Dec 16, 2021 •

edited

Loading

elasticmachine commented Dec 16, 2021

elasticmachine commented Dec 16, 2021

ymao1 commented Dec 16, 2021

tylersmalley commented Dec 16, 2021

pmuellr commented Dec 16, 2021

pmuellr commented Dec 16, 2021

tylersmalley commented Dec 17, 2021

ymao1 commented Dec 17, 2021

tylersmalley commented Dec 17, 2021 •

edited

Loading

tylersmalley commented Mar 22, 2022 •

edited

Loading

pmuellr commented Mar 22, 2022

EricDavisX commented Mar 23, 2022

pmuellr commented Mar 23, 2022

EricDavisX commented Mar 29, 2022

[Alerting] Performance testing tool for alerting need to have the ability to create/clean ecctl Kibana deployment. #121457

[Alerting] Performance testing tool for alerting need to have the ability to create/clean ecctl Kibana deployment. #121457

Comments

YulNaumenko commented Dec 16, 2021 • edited Loading

elasticmachine commented Dec 16, 2021

elasticmachine commented Dec 16, 2021

ymao1 commented Dec 16, 2021

tylersmalley commented Dec 16, 2021

pmuellr commented Dec 16, 2021

pmuellr commented Dec 16, 2021

tylersmalley commented Dec 17, 2021

ymao1 commented Dec 17, 2021

tylersmalley commented Dec 17, 2021 • edited Loading

tylersmalley commented Mar 22, 2022 • edited Loading

pmuellr commented Mar 22, 2022

EricDavisX commented Mar 23, 2022

pmuellr commented Mar 23, 2022

EricDavisX commented Mar 29, 2022

[Alerting] Performance testing tool for alerting need to have the ability to create/clean `ecctl` Kibana deployment. #121457

[Alerting] Performance testing tool for alerting need to have the ability to create/clean `ecctl` Kibana deployment. #121457

YulNaumenko commented Dec 16, 2021 •

edited

Loading

tylersmalley commented Dec 17, 2021 •

edited

Loading

tylersmalley commented Mar 22, 2022 •

edited

Loading