Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new index setting to skip recovery source when synthetic source is enabled #114618

Merged
merged 55 commits into from
Dec 10, 2024

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Oct 11, 2024

This PR draft proposes a method to skip the recovery source when LogsDB is enabled.
The recovery source is only used for peer recovery so two alternatives are introduced to handle the removal in
Translog.Snapshot:

  • Modified LuceneChangesSnapshot: Reads from a synthetic source when the recovery source is not enabled.
  • New LuceneBatchChangesSnapshot: Retrieves the (synthetic) source of multiple documents in batch (with a configurable batchSize).

The new LuceneBatchChangesSnapshot holds batchSize operations (including source and metadata fields) in memory simultaneously, unlike the current LuceneChangesSnapshot, which keeps only one operation at a time.
The batch size is configurable to limit the memory usage.

Benchmark Overview

A benchmark comparing these two methods is added to the PR. It includes sources extracted from three Elastic integrations:

  • Kafka logs (logs-kafka-log)
  • Endpoint events process logs (logs-endpoint-events-process)
  • Endpoint events security logs (logs-endpoint-events-security)

The benchmark retrieves 10,000 documents for recovery through the Translog.Snapshot. Document throughput per millisecond is recorded for the following recovery strategies:

  • Default: Default mode where the source is stored.
  • LogsDB: LogsDB mode, which uses a synthetic source.
  • LogsDB Batch: Synthetic source retrieved in batches of varying sizes (16, 64, 512, 1024 documents).

The benchmark is executed with two scenarios:

  • Sequential: Document IDs are perfectly contiguous and sorted by sequence numbers. This scenario is extremely rare due to concurrent indexing and index sorting.
  • Non-Sequential: The typical case, where document IDs are scattered and not ordered by sequence numbers. This is the most relevant for performance evaluation.

While sequential tests are included for completeness, these are not expected in real-world scenarios, especially in LogsDB mode where index sorting is used. Therefore, only the non-sequential results are considered relevant.

Non-Sequential Benchmark Results

Kafka Logs

Strategy Batch Size Throughput (ops/ms) ± Error Sequential
Default 1024 22.527 ± 0.923 No
LogsDB 1024 7.555 ± 0.584 No
LogsDB Batch 16 13.417 ± 0.531 No
LogsDB Batch 64 13.800 ± 3.530 No
LogsDB Batch 512 15.669 ± 2.068 No
LogsDB Batch 1024 16.474 ± 2.231 No

The default mode shows a much higher throughput (22.5 ops/ms) compared to LogsDB (7.5 ops/ms). However, using batch sizes in the LogsDB mode significantly improves throughput, with the highest results occurring at batch sizes of 512 and 1024.


Endpoint Events Process Logs

Strategy Batch Size Throughput (ops/ms) ± Error Sequential
Default 1024 8.303 ± 1.428 No
LogsDB 1024 2.932 ± 0.338 No
LogsDB Batch 16 8.141 ± 1.140 No
LogsDB Batch 64 9.828 ± 2.041 No
LogsDB Batch 512 11.845 ± 1.224 No
LogsDB Batch 1024 12.262 ± 0.684 No

The default mode for endpoint events process logs performs better than LogsDB, as seen with Kafka logs. Small batch sizes improves throughput significantly, reaching 12.2 ops/ms with a batch size of 1024.

Endpoint Events Security Logs

Strategy Batch Size Throughput (ops/ms) ± Error Sequential
Default 1024 13.179 ± 2.300 No
LogsDB 1024 7.803 ± 0.326 No
LogsDB Batch 16 12.546 ± 1.565 No
LogsDB Batch 64 13.868 ± 1.461 No
LogsDB Batch 512 14.920 ± 1.440 No
LogsDB Batch 1024 15.382 ± 2.623 No

Endpoint events security logs follow a similar pattern as the other logs. Default mode performs best, but batching closes the gap significantly.


Appendix: Sequential Benchmark Results

Sequential execution should rarely occur in practice, particularly in LogsDB mode where index sorting is typically enabled. For completeness, the sequential results are reported but are not relevant to the typical use case.

Kafka Logs (Sequential)

Strategy Batch Size Throughput (ops/ms) ± Error Sequential
Default 1024 1355.937 ± 89.627 Yes
LogsDB 1024 76.660 ± 9.750 Yes
LogsDB Batch 16 47.799 ± 2.930 Yes
LogsDB Batch 64 70.981 ± 4.869 Yes
LogsDB Batch 512 73.849 ± 6.045 Yes
LogsDB Batch 1024 76.441 ± 1.078 Yes

Endpoint Events Process Logs (Sequential)

Strategy Batch Size Throughput (ops/ms) ± Error Sequential
Default 1024 15.319 ± 4.449 Yes
LogsDB 1024 3.168 ± 0.159 Yes
LogsDB Batch 16 17.040 ± 0.745 Yes
LogsDB Batch 64 16.611 ± 2.001 Yes
LogsDB Batch 512 14.190 ± 1.647 Yes
LogsDB Batch 1024 14.222 ± 0.882 Yes

Endpoint Events Security Logs (Sequential)

Strategy Batch Size Throughput (ops/ms) ± Error Sequential
Default 1024 996.106 ± 162.857 Yes
LogsDB 1024 77.866 ± 6.433 Yes
LogsDB Batch 16 48.881 ± 4.430 Yes
LogsDB Batch 64 69.328 ± 7.182 Yes
LogsDB Batch 512 77.442 ± 16.594 Yes
LogsDB Batch 1024 78.615 ± 2.341 Yes

Relates to #116726

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

@jimczi
Copy link
Contributor Author

jimczi commented Oct 11, 2024

@martijnvg, here are the complete results we discussed offline. I've disabled the recovery source only for logsdb to allow for flexibility if needed. This is just a draft, so nothing is integrated yet. We begin with a benchmark to evaluate the various options.

@martijnvg
Copy link
Member

Nice work @jimczi! I need some time to take a good look at this to see how we can move this forward,

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look Jim and this looks very exciting!

Some thoughts:

  • Would be good to get feedback of @elastic/es-distributed-indexing team around this draft PR.
  • The micro benchmark is a good start to asses how reading translog operations from Lucene index are affected. Next step is that we see how performance or shard recovery and cross cluster replication is affected by this change. I'm not sure what benchmarking tools we have for this. Maybe we can just start by benchmarkmarking shard changes action?
  • The current micro benchmark results suggest to me that not storing and indexing recovery source at the cost of (sometimes) slower reading of translog operations is a reasonable tradeoff.
  • I think If we want to proceed with a change like this we properly want to enable this first behind a feature flag.
  • I think we should always have a setting that acts as an escape hatch and falls back to LuceneChangesSnapshot implementation.

@dnhatn
Copy link
Member

dnhatn commented Oct 18, 2024

@jimczi I benchmarked this change using the elastic/logs track, and the results are great. I think we don't need a high default batch_size; 256 or lower should be sufficient. The approach in the PR looks promising. Could we update it to be ready for review? Thank you!

@jimczi jimczi marked this pull request as ready for review November 6, 2024 13:11
@jimczi
Copy link
Contributor Author

jimczi commented Nov 6, 2024

@dnhatn @martijnvg The PR is now ready for a more thorough review.

The latest changes incorporate the new behavior as an index setting. We’re still discussing whether this should be the default for synthetic sources or just for logsdb, so I haven’t finalized that aspect in the code yet.

When the new setting is enabled (only at index creation), synthetic sources are retrieved in batches using the new LuceneSyntheticSourceChangesSnapshot. The uncompressed source size is recorded during ingestion and used during recovery to limit memory usage when retrieving a batch of synthetic sources. The source size field is then pruned during merges, similar to the recovery source.

The maximum memory size is currently set to 4MB, but we could adjust it based on the available JVM memory on the node if needed.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@jimczi
Copy link
Contributor Author

jimczi commented Dec 9, 2024

@elasticmachine run Elasticsearch Serverless Checks

@jimczi jimczi merged commit d213efd into elastic:main Dec 10, 2024
16 checks passed
@jimczi jimczi deleted the lucene_changes_synthetic_snapshot branch December 10, 2024 23:05
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Dec 11, 2024
… is enabled (elastic#114618)

This change adds a new undocumented index settings that allows to use synthetic source for recovery and CCR without storing a recovery source.
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Dec 11, 2024
The new method with the overloaded chunk size should be used instead.
Relates elastic#114618
jimczi added a commit that referenced this pull request Dec 11, 2024
The new method with the overloaded chunk size should be used instead.
Relates #114618
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement :StorageEngine/Logs You know, for Logs Team:Distributed Indexing Meta label for Distributed Indexing team Team:StorageEngine v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants