[Index Management] TESTING Added logger to fetch indices route #126169

yuliacech · 2022-02-22T15:54:50Z

This PR adds a logger to Index Management "list/reload indices" route to test where the loading time is spent when a large list of indices is being retrieved. The code is not intended to be merged and the main goal of this PR is to test the indices list performance on Cloud (see #126242).

The logger is added on following "checkpoints":

Before and after Get all Indices request to ES is completed
Before and after Get Indices Stats request to ES is completed
Before and after each index data enricher:

ILM Explain lifecycle request
Rollup job capabilities request
CCR follower indices request

…ex_data_enrichers

sebelga · 2022-02-23T15:56:33Z

Not sure how precise the Logger is, what I had in mind was to use console.time with labels (https://www.geeksforgeeks.org/node-js-console-time-method/)

tylersmalley · 2022-02-24T23:40:04Z

Not sure if it's related to the changes here - but the Kibana instance in the cloud deployment ran out of memory and was restarted.

tylersmalley · 2022-02-28T04:44:07Z

I am actually thinking it's most likely related to this change, as it's happened three more times since and we have yet to experience it on any other deployments.

yuliacech · 2022-02-28T15:16:32Z

Thanks a lot for checking on this deployment, @tylersmalley!
I added the logger to log some info only when indices list is loaded in Kibana, so I'm wondering if Kibana running out of memory could be related to something else. For example that the monitoring has been enabled on the deployment 5 days ago?
Also, I added 1000 small indices (only 1 doc) to test indices list performance, can this be related as well?

yuliacech · 2022-02-28T15:46:06Z

@elasticmachine merge upstream

jbudz · 2022-02-28T16:22:06Z

Logs are mostly filled with Elasticsearch GC. Mind if we scale the cluster up? I'm hoping we can get stack monitoring working again to keep a closer eye on memory.

yuliacech · 2022-02-28T16:24:46Z

Sure, that would be great, @jbudz! I currently have 1000 indices, but I'm planning to run my test with a large amount of indices: 2000, 3000 etc maybe up to 5000-10000 indices. Do you think the deployment has problems because of that?

jbudz · 2022-02-28T16:27:26Z

It's hard to say with the current logs. I see random spikes that look pretty consistent with user access from a browser(4 hours ago for example) and browser refreshes so I'm wondering if that's possible. Adding more shards could definitely help us narrow that down.

Bumping ES to 4GB - will report back once it's back up.

jbudz · 2022-02-28T16:36:28Z

Okay it's back up - I'll keep a tab open to monitor.

kibana-ci · 2022-02-28T17:56:55Z

💔 Build Failed

Failed CI Steps

Test Failures

[job] [logs] Jest Tests #6 / [Index management API Routes] fetch indices lib function data stream index
[job] [logs] Jest Tests #6 / [Index management API Routes] fetch indices lib function frozen index
[job] [logs] Jest Tests #6 / [Index management API Routes] fetch indices lib function hidden index
[job] [logs] Jest Tests #6 / [Index management API Routes] fetch indices lib function index missing in stats call
[job] [logs] Jest Tests #6 / [Index management API Routes] fetch indices lib function index with aliases
[job] [logs] Jest Tests #6 / [Index management API Routes] fetch indices lib function regular index

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id	before	after	diff
`apm`	15	14	-1

ESLint disabled line counts

id	before	after	diff
`apm`	85	82	-3

Total ESLint disabled count

id	before	after	diff
`apm`	100	96	-4

History

💔 Build #26565 failed 2278404
💔 Build #25888 failed fb4361a
💔 Build #25585 failed bb75f3f

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

yuliacech · 2022-03-01T15:14:43Z

FIY, I'm currently adding another 1000 indices into the deployment to test performance with 2000 indices.

yuliacech · 2022-03-03T15:44:40Z

@jbudz I'm at about 5000 small indices in the deployment and would like to get to 10 000 to complete my testing. But I've started getting this error when creating new indices

{"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [1920271724/1.7gb], which is larger than the limit of [1860802969/1.7gb], real usage: [1920271544/1.7gb], new bytes reserved: [180/180b], usages [fielddata=914/914b, request=24019000/22.9mb, inflight_requests=180/180b, model_inference=0/0b, eql_sequence=0/0b]","bytes_wanted":1920271724,"bytes_limit":1860802969,"durability":"TRANSIENT"}],"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [1920271724/1.7gb], which is larger than the limit of [1860802969/1.7gb], real usage: [1920271544/1.7gb], new bytes reserved: [180/180b], usages [fielddata=914/914b, request=24019000/22.9mb, inflight_requests=180/180b, model_inference=0/0b, eql_sequence=0/0b]","bytes_wanted":1920271724,"bytes_limit":1860802969,"durability":"TRANSIENT"},"status":429}

Do you maybe know what that is related to and if it's possible to re-configure the deployment to handle this?

jbudz · 2022-03-03T16:01:23Z

I just bumped the cluster to 8gb of RAM. It's definitely the number of shards/indices that's causing things to slow down. Given we're the only client ATM - I expect Kibana isn't very friendly to heavily sharded deployments.

It could be a lot of things - alerting, monitoring loading a list of all indices ( 1mb+ per xhr request, auto reloading every 10 seconds) and so on.

This is probably something that should be added to our performance working group - cc @tylersmalley @danielmitterdorfer . Recap: a 4gb cluster with 1000-5000 one document indices is going OOM with ~1 active Kibana user.

yuliacech · 2022-03-03T16:16:47Z

Thank you @jbudz!
Here is also the script that I use to add the indices

#!/bin/bash

USERNAME=${USERNAME:-elastic}
PASSWORD=${PASSWORD:-password}
COUNT=${COUNT:-1}
START=${START:-1}
HOST=${HOST:-"https://kibana-pr-126169.es.us-west2.gcp.elastic-cloud.com:9243"}


curl -X PUT -u $USERNAME:$PASSWORD "$HOST/_cluster/settings" -H "Content-Type: application/json" -d '{ "persistent": { "cluster.max_shards_per_node": "6000" } }'

for i in $(seq "$START" "$COUNT")
do
 echo 
 echo 'test_index - create mapping'
 curl -X PUT -u $USERNAME:$PASSWORD "$HOST/test_index_$i" -H "Content-Type: application/json" -d'
 {
    "settings": {
      "index": {
        "number_of_replicas": 0
      }
    }
  }
 '
 echo 
 echo 'text_index - add doc'
 curl -X PUT -u $USERNAME:$PASSWORD "$HOST/test_index_$i/_doc/1" -H "Content-Type: application/json" -d'
 {
   "timestamp": 21347237412
  }
 '
done

danielmitterdorfer · 2022-03-04T08:21:36Z

This is probably something that should be added to our performance working group - cc @tylersmalley @danielmitterdorfer . Recap: a 4gb cluster with 1000-5000 one document indices is going OOM with ~1 active Kibana user.

Improving the efficiency of Elasticsearch with many indices/shards is actively tackled by the Elasticsearch's distributed team at the moment. See for example the blog post Three ways we've improved Elasticsearch scalability for recent improvements in 7.16. Note that these improvements do not change our recommended of shards/GB RAM yet though.

For this targeted one-off test, I could see two options:

We increase the cluster size
Instead of testing against a real Elasticsearch cluster, we mock the responses

yuliacech · 2022-03-04T13:33:36Z

I have now tested with about 8000 indices and I think the results give us some good insights into what we can work on to improve indices list performance. I will add all the findings to #126242.
This PR can be closed and the deployment deleted. Thanks a lot for the support, @jbudz!
@danielmitterdorfer Yes, the scenario I was testing is not currently recommended for deployments and this PR was to research limitations of Index Management regarding to the "many shards" project. I hope we can improve performance and handle many indices in Kibana in future.

[Index Management] Added logger for the fetch_indices request and ind…

bb75f3f

…ex_data_enrichers

yuliacech added the ci:deploy-cloud label Feb 22, 2022

yuliacech mentioned this pull request Feb 23, 2022

[Index Management] Large number of indices take a long time to load #126242

Open

3 tasks

yuliacech changed the title ~~[Index Management] Added logger for Indices list requests~~ [Index Management] Added logger for Indices list requests TEST Feb 23, 2022

Merge branch 'main' into indices_list_performance_logger

fb4361a

yuliacech changed the title ~~[Index Management] Added logger for Indices list requests TEST~~ [Index Management] TESTING Added logger to fetch indices route Feb 23, 2022

yuliacech marked this pull request as ready for review February 23, 2022 14:21

yuliacech requested a review from a team as a code owner February 23, 2022 14:21

Merge branch 'main' into indices_list_performance_logger

2278404

[Index Management] Improved logging messages for easier monitoring

b91094e

yuliacech closed this Mar 4, 2022

tylersmalley added ci:cloud-deploy Create or update a Cloud deployment and removed ci:deploy-cloud labels Aug 17, 2022

yuliacech deleted the indices_list_performance_logger branch February 15, 2024 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Index Management] TESTING Added logger to fetch indices route #126169

[Index Management] TESTING Added logger to fetch indices route #126169

yuliacech commented Feb 22, 2022 •

edited

Loading

sebelga commented Feb 23, 2022

tylersmalley commented Feb 24, 2022

tylersmalley commented Feb 28, 2022

yuliacech commented Feb 28, 2022

yuliacech commented Feb 28, 2022

jbudz commented Feb 28, 2022 •

edited

Loading

yuliacech commented Feb 28, 2022

jbudz commented Feb 28, 2022

jbudz commented Feb 28, 2022

kibana-ci commented Feb 28, 2022 •

edited

Loading

ESLint disabled in files

ESLint disabled line counts

Total ESLint disabled count

yuliacech commented Mar 1, 2022

yuliacech commented Mar 3, 2022

jbudz commented Mar 3, 2022

yuliacech commented Mar 3, 2022

danielmitterdorfer commented Mar 4, 2022

yuliacech commented Mar 4, 2022

[Index Management] TESTING Added logger to fetch indices route #126169

[Index Management] TESTING Added logger to fetch indices route #126169

Conversation

yuliacech commented Feb 22, 2022 • edited Loading

sebelga commented Feb 23, 2022

tylersmalley commented Feb 24, 2022

tylersmalley commented Feb 28, 2022

yuliacech commented Feb 28, 2022

yuliacech commented Feb 28, 2022

jbudz commented Feb 28, 2022 • edited Loading

yuliacech commented Feb 28, 2022

jbudz commented Feb 28, 2022

jbudz commented Feb 28, 2022

kibana-ci commented Feb 28, 2022 • edited Loading

💔 Build Failed

Failed CI Steps

Test Failures

Metrics [docs]

ESLint disabled in files

ESLint disabled line counts

Total ESLint disabled count

History

yuliacech commented Mar 1, 2022

yuliacech commented Mar 3, 2022

jbudz commented Mar 3, 2022

yuliacech commented Mar 3, 2022

danielmitterdorfer commented Mar 4, 2022

yuliacech commented Mar 4, 2022

yuliacech commented Feb 22, 2022 •

edited

Loading

jbudz commented Feb 28, 2022 •

edited

Loading

kibana-ci commented Feb 28, 2022 •

edited

Loading