Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downsampling performance analysis and improvement #90226

Closed
salvatore-campagna opened this issue Sep 22, 2022 · 5 comments
Closed

Downsampling performance analysis and improvement #90226

salvatore-campagna opened this issue Sep 22, 2022 · 5 comments
Assignees
Labels
>enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented Sep 22, 2022

Description

We would like to measure performances of downsampling operations using Rally. For this purpose we need to include a new challenge to the existing tsdb Rally track. The new challenge will measure latency for a limited set of downsampling operations using different values for the fixed_interval parameter. As part of the analysis we need to collect JFR recordings and flame graphs so that we can spot areas of the code we can improve.

Right now the tsdb track uses a dataset including more than 116M documents for a total JSON file size of more than 120 GB, which results in a 32.5 GB index. The plan is to measure downsampling latency with a single thread implementation, a single node Elasticsearch cluster and a single shard.

@salvatore-campagna salvatore-campagna added >enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Sep 22, 2022
@salvatore-campagna salvatore-campagna self-assigned this Sep 22, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Sep 22, 2022

Attaching flame graph collected while running two downsample operations and JFR recording collected for the whole challenge:

  • downsample-1h: using fixed_interval: 1h
  • downsample-1d:using fixed_interval: 1d

downsample-flamegraph-d6e36b58-52ff-495f-b64d-a765c368f7ad.html.zip

profile-d6e36b58-52ff-495f-b64d-a765c368f7ad.jfr.zip

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Sep 22, 2022

As a result of analyzing both the JFR recording and the flame graph I see two improvements which are worth working on:

  1. making sure we don't decode keyword fields (BytesRef to UTF) and just use the BytesRef (see RollupShardIndexer)
  2. making sure we get rid of hash map access and iteration while collecting fields (see RollupShardIndexer)

Other than that, time is spent reading doc values which is expected.

NOTE: after merging PR #90088 we see consistent and significant improvements in latency. Latest tests show that both the downsampling operations (1h and 1d fixed interval) take around 30 minutes to complete.

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Sep 22, 2022

Downsampling the same source index using 1m fixed interval took about 1.5 hours producing an index with about 7M documents. Attaching JFR recording and flames graph.

profile-9c51f1f9-816b-4fa7-8739-aa140dd6d3e6.jfr.zip

downsample-flamegraph-9c51f1f9-816b-4fa7-8739-aa140dd6d3e6.html.zip

@salvatore-campagna
Copy link
Contributor Author

Closing after the following PR has been merged #92494

@craigtaverner craigtaverner changed the title Downsamplig performance analysis and improvement Downsampling performance analysis and improvement Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

2 participants