Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Kibana platform performance #63848

Closed
17 of 29 tasks
mshustov opened this issue Apr 17, 2020 · 15 comments
Closed
17 of 29 tasks

[Meta] Kibana platform performance #63848

mshustov opened this issue Apr 17, 2020 · 15 comments
Labels
discuss Meta performance Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team

Comments

@mshustov
Copy link
Contributor

mshustov commented Apr 17, 2020

Introducing the Kibana platform changed the way Kibana applications are built, loaded and run.
We didn't gather performance metrics before & during migration and found ourselves in the position when our customers already started experiencing degraded performance for page load time.
To improve the current situation we can split our work into different categories:

Page loading time

All the plugins are built as separate packages in the Kibana platform. It increased both the size of each bundle downloaded at startup and the number of simultaneous concurrent requests to the server.
Sub-tasks:

To prevent problems on an early stage in the future we are going to start tracking performance metrics during development(CI metric report)
I'm wondering if we can collaborate with Elastic Cloud / Telemetry / Pulse teams on creating a centralized performance dashboard for production load time metrics.

Runtime performance

This falls into 2 sub-categories:

Memory

Kibana platform was created in mind with supporting SPA mode for Kibana. It means that time of life for the Kibana app is much higher as a page is reloaded less frequently. This puts increased demands on memory leak control. Kibana must remain operable when one application is running for a long time and when the user switches between several applications. We should automate such a check on CI. @elastic/kibana-qa have you got a setup for such type of testing? I saw some dashboards for similar metrics in #59454

CPU

That's tricky and might require setting up APM for Kibana.

Sub-tasks:

@mshustov mshustov added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team Meta performance labels Apr 17, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@LeeDr
Copy link

LeeDr commented Apr 17, 2020

I think we need at least 2 types of performance tests for Kibana.

  1. UI performance How long does it take for the first page to load? How long does it take to switch between apps? Hong long for this dashboard to load.
    The easiest way to get started on this, is just to track the duration of all our existing UI tests. @stacey-gammon did it here Create functional test suite for performance benchmarking #54626. And @wayneseymour has been looking into adding that to the code coverage job. Different tests do many different things, so in themselves they may not seem like meaningful metrics. But they could at least show changes from one build or release to another. There are some efforts going on towards this right now. Gather it from Kibana CI builds by @brianseeders here [FTR] Add test suite metrics tracking/output #62515.
    We could very easily enhance the existing UI tests with tests designed specifically to measure initial page load time, app switching, etc. Anything we decide we need.
  • We should also use either Monitoring, Metricbeat, or both to gather stats on both the Kibana server and the browser memory while tests are running to look for problems. @marius-dr gained some experience doing this while investigating Firefox memory usage.
  1. Load tests Kibana has a growing set of APIs for saved objects, user/role management, batch reindexing, task manager, etc. We need a test framework where all these can be tested vigorously simulating multiple users and high loads. We might use tools like jMeter, Gatlin, Apache AB, Horde, etc. for this. @dmlemeshko is just getting started looking into these tools.

  2. APM Yes. Getting Kibana instrumented for APM might be the hardest part, but might also have a big payback. I don't know of any current efforts on this.

@TinaHeiligers
Copy link
Contributor

@restrry The APM team is more appropriate for advice on setting up APM for Kibana. AFAIK, there's already an option to do so.

@afharo
Copy link
Member

afharo commented Apr 17, 2020

I think we should have APM running across our Kibana Platform for our Perf tests. We can learn great points of improvements there.

That said, would it make sense to have a Pulse channel with a minimum set of performance stats? We can't (and don't want to) go as deep as APM in the analysis, but something that goes into telemetry to let us understand the best and recommended hardware based on our users' experiences? i.e.: We can learn that users with hardware X perceived an improved behaviour over those ones running on hardware Y or provider Z.

@mshustov
Copy link
Contributor Author

We could very easily enhance the existing UI tests with tests designed specifically to measure initial page load time, app switching, etc. Anything we decide we need.

@LeeDr We are working on #62263 and interested in time between a request initiated and page with an app rendered.
Is it already technically possible to collect that information for tests?
Would it be possible to add a test suite / script / whatever for developers to run locally to measure how their changes impacted loading performance?

@LeeDr
Copy link

LeeDr commented Apr 20, 2020

@restrry it's possible to measure that with some caveats.

  1. With our current plan, we were just going to collect the duration of functional UI tests. Most of the tests do more than just open an app and wait for it to render so those times will include the app loading but also other steps. But we certainly could make a set of "performance" tests that just wait for the app to load.
  2. We need to understand how we "know" that an app has finished rendering. We can wait for a loading indicator to be hidden, and/or for one or more elements to appear on a page within the app. It would be nice to know when the last element of an app has loaded. But we don't want to use a technique that will be flaky or a maintenance issue.
  3. These measurements will only be accurate within about 1/2 second since that's how long we typically wait between retries. Even WebDriver uses a polling mechanism when looking for elements with some small delay between attempts. But I think loading an app is going to be several seconds so this level of accuracy may be OK. We may see significant differences when running tests locally vs on Jenkins. And even on Jenkins we may see significant variability between runs of the same build.

@mshustov
Copy link
Contributor Author

But we certainly could make a set of "performance" tests that just wait for the app to load.

We can write a specific test-case loading only a pre-defined page x times to minimize the accidental impact of different applications and external environments. How do we push data into external storage for future analysis on kibana-stats?

We need to understand how we "know" that an app has finished rendering. We can wait for a loading indicator to be hidden, and/or for one or more elements to appear on a page within the app

I believe we can consider loading indicator to be hidden as a proper signal that all the resources were loaded and parsed. After this moment it's up to app logic to perform some background requests to load data. So I'd say it's application-specific, and we shouldn't take them into account as part of the current task.

These measurements will only be accurate within about 1/2 second

That's not really good. Is it okay if we make retry delay configurable? @dmlemeshko

And even on Jenkins we may see significant variability between runs of the same build.

Yes, that will require additional work (if even possible) to run the test on the same hardware in an isolated env. Not sure we have time to do it properly right now. IMO, running tests on Jenkins several times is the acceptable solution at the moment.

@mshustov
Copy link
Contributor Author

@LeeDr I remember that some aspects of the perf testing have been discussed on GAH. From the summary email:

- server load testing for concurrent requests, concurrent users ( coming )
- endurance testing for browser memory leaks ( coming )

However, I don't see any issues linked to QA team roadmap https://github.com/elastic/kibana-team/issues/103 Would you mind adding the issues to the roadmap and the perf meta issue?

@LeeDr
Copy link

LeeDr commented Jul 16, 2020

@dmlemeshko is working on the first one but still comparing a couple of different tools to find what will work best long-term for Kibana. We'll get an issue created soon to document the plan and update the status.

  • server load testing for concurrent requests, concurrent users ( coming )

We haven't started anything on this one yet (Memory endurance testing (Marius);

  • endurance testing for browser memory leaks ( coming )

@mshustov
Copy link
Contributor Author

mshustov commented Jul 19, 2020

Since the 7.7 release and onward, the Reporting plugin has increasingly had a harder time completing reports in reasonable amount of time, especially on machines with busy CPU or low RAM resources.
We added documentation to inform the reader that 1GB is not enough RAM for the Kibana instance to work with Reporting.
One of the biggest factors that still lead to slow report generation time is:
Bundle sizes increasing release by release as more features are added. Different App teams need to help here by moving more UI code to be lazy loaded on-demand, instead of loading everything up-front.

from #71753

@marius-dr
Copy link
Member

We haven't started anything on this one yet (Memory endurance testing (Marius);

  • endurance testing for browser memory leaks ( coming )

I have some plans for this, will start joining the performance group sync every time.
I've been normally running it over the weekends on my desktop PC, mainly on BC builds for 7.x versions (not for minors). What I didn't manage to figure out is how to keep the tests relevant with new features but also comparable with each other. Initial thoughts would be to create suites of tests that cover scenarios/"user stories" and go from there. I'll put some updates in the kibana-qa issue for it.

@lizozom
Copy link
Contributor

lizozom commented Oct 14, 2021

@mshustov I think this and this should address measuring page load time.

@tylersmalley
Copy link
Contributor

@suchcodemuchwow is also working to capture page load time currently in an isolated environment simulating a real-world user.

@lizozom lizozom changed the title Kibana platform performance [Meta] Kibana platform performance Nov 10, 2021
@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Feb 16, 2022
@tylersmalley tylersmalley removed loe:small Small Level of Effort impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. EnableJiraSync labels Mar 16, 2022
@lizozom
Copy link
Contributor

lizozom commented Jul 20, 2022

Closing for now, lets reopen if needed

@lizozom lizozom closed this as completed Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Meta performance Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team
Projects
None yet
Development

No branches or pull requests

8 participants