Add a new index setting to skip recovery source when synthetic source is enabled #114618

jimczi · 2024-10-11T13:34:02Z

This PR draft proposes a method to skip the recovery source when LogsDB is enabled.
The recovery source is only used for peer recovery so two alternatives are introduced to handle the removal in
Translog.Snapshot:

Modified LuceneChangesSnapshot: Reads from a synthetic source when the recovery source is not enabled.
New LuceneBatchChangesSnapshot: Retrieves the (synthetic) source of multiple documents in batch (with a configurable batchSize).

The new LuceneBatchChangesSnapshot holds batchSize operations (including source and metadata fields) in memory simultaneously, unlike the current LuceneChangesSnapshot, which keeps only one operation at a time.
The batch size is configurable to limit the memory usage.

Benchmark Overview

A benchmark comparing these two methods is added to the PR. It includes sources extracted from three Elastic integrations:

Kafka logs (logs-kafka-log)
Endpoint events process logs (logs-endpoint-events-process)
Endpoint events security logs (logs-endpoint-events-security)

The benchmark retrieves 10,000 documents for recovery through the Translog.Snapshot. Document throughput per millisecond is recorded for the following recovery strategies:

Default: Default mode where the source is stored.
LogsDB: LogsDB mode, which uses a synthetic source.
LogsDB Batch: Synthetic source retrieved in batches of varying sizes (16, 64, 512, 1024 documents).

The benchmark is executed with two scenarios:

Sequential: Document IDs are perfectly contiguous and sorted by sequence numbers. This scenario is extremely rare due to concurrent indexing and index sorting.
Non-Sequential: The typical case, where document IDs are scattered and not ordered by sequence numbers. This is the most relevant for performance evaluation.

While sequential tests are included for completeness, these are not expected in real-world scenarios, especially in LogsDB mode where index sorting is used. Therefore, only the non-sequential results are considered relevant.

Non-Sequential Benchmark Results

Kafka Logs

Strategy	Batch Size	Throughput (ops/ms) ± Error	Sequential
Default	1024	22.527 ± 0.923	No
LogsDB	1024	7.555 ± 0.584	No
LogsDB Batch	16	13.417 ± 0.531	No
LogsDB Batch	64	13.800 ± 3.530	No
LogsDB Batch	512	15.669 ± 2.068	No
LogsDB Batch	1024	16.474 ± 2.231	No

The default mode shows a much higher throughput (22.5 ops/ms) compared to LogsDB (7.5 ops/ms). However, using batch sizes in the LogsDB mode significantly improves throughput, with the highest results occurring at batch sizes of 512 and 1024.

Endpoint Events Process Logs

Strategy	Batch Size	Throughput (ops/ms) ± Error	Sequential
Default	1024	8.303 ± 1.428	No
LogsDB	1024	2.932 ± 0.338	No
LogsDB Batch	16	8.141 ± 1.140	No
LogsDB Batch	64	9.828 ± 2.041	No
LogsDB Batch	512	11.845 ± 1.224	No
LogsDB Batch	1024	12.262 ± 0.684	No

The default mode for endpoint events process logs performs better than LogsDB, as seen with Kafka logs. Small batch sizes improves throughput significantly, reaching 12.2 ops/ms with a batch size of 1024.

Endpoint Events Security Logs

Strategy	Batch Size	Throughput (ops/ms) ± Error	Sequential
Default	1024	13.179 ± 2.300	No
LogsDB	1024	7.803 ± 0.326	No
LogsDB Batch	16	12.546 ± 1.565	No
LogsDB Batch	64	13.868 ± 1.461	No
LogsDB Batch	512	14.920 ± 1.440	No
LogsDB Batch	1024	15.382 ± 2.623	No

Endpoint events security logs follow a similar pattern as the other logs. Default mode performs best, but batching closes the gap significantly.

Appendix: Sequential Benchmark Results

Sequential execution should rarely occur in practice, particularly in LogsDB mode where index sorting is typically enabled. For completeness, the sequential results are reported but are not relevant to the typical use case.

Kafka Logs (Sequential)

Strategy	Batch Size	Throughput (ops/ms) ± Error	Sequential
Default	1024	1355.937 ± 89.627	Yes
LogsDB	1024	76.660 ± 9.750	Yes
LogsDB Batch	16	47.799 ± 2.930	Yes
LogsDB Batch	64	70.981 ± 4.869	Yes
LogsDB Batch	512	73.849 ± 6.045	Yes
LogsDB Batch	1024	76.441 ± 1.078	Yes

Endpoint Events Process Logs (Sequential)

Strategy	Batch Size	Throughput (ops/ms) ± Error	Sequential
Default	1024	15.319 ± 4.449	Yes
LogsDB	1024	3.168 ± 0.159	Yes
LogsDB Batch	16	17.040 ± 0.745	Yes
LogsDB Batch	64	16.611 ± 2.001	Yes
LogsDB Batch	512	14.190 ± 1.647	Yes
LogsDB Batch	1024	14.222 ± 0.882	Yes

Endpoint Events Security Logs (Sequential)

Strategy	Batch Size	Throughput (ops/ms) ± Error	Sequential
Default	1024	996.106 ± 162.857	Yes
LogsDB	1024	77.866 ± 6.433	Yes
LogsDB Batch	16	48.881 ± 4.430	Yes
LogsDB Batch	64	69.328 ± 7.182	Yes
LogsDB Batch	512	77.442 ± 16.594	Yes
LogsDB Batch	1024	78.615 ± 2.341	Yes

Relates to #116726

…hetic_snapshot

elasticsearchmachine · 2024-10-11T13:34:49Z

Hi @jimczi, I've created a changelog YAML for you.

jimczi · 2024-10-11T13:39:48Z

@martijnvg, here are the complete results we discussed offline. I've disabled the recovery source only for logsdb to allow for flexibility if needed. This is just a draft, so nothing is integrated yet. We begin with a benchmark to evaluate the various options.

martijnvg · 2024-10-11T17:53:44Z

Nice work @jimczi! I need some time to take a good look at this to see how we can move this forward,

martijnvg

I took a look Jim and this looks very exciting!

Some thoughts:

Would be good to get feedback of @elastic/es-distributed-indexing team around this draft PR.
The micro benchmark is a good start to asses how reading translog operations from Lucene index are affected. Next step is that we see how performance or shard recovery and cross cluster replication is affected by this change. I'm not sure what benchmarking tools we have for this. Maybe we can just start by benchmarkmarking shard changes action?
The current micro benchmark results suggest to me that not storing and indexing recovery source at the cost of (sometimes) slower reading of translog operations is a reasonable tradeoff.
I think If we want to proceed with a change like this we properly want to enable this first behind a feature flag.
I think we should always have a setting that acts as an escape hatch and falls back to LuceneChangesSnapshot implementation.

server/src/main/java/org/elasticsearch/index/engine/LuceneBatchChangesSnapshot.java

...src/main/java/org/elasticsearch/index/mapper/SortedNumericDocValuesSyntheticFieldLoader.java

server/src/main/java/org/elasticsearch/index/engine/LuceneBatchChangesSnapshot.java

dnhatn · 2024-10-18T03:55:58Z

@jimczi I benchmarked this change using the elastic/logs track, and the results are great. I think we don't need a high default batch_size; 256 or lower should be sufficient. The approach in the PR looks promising. Could we update it to be ready for review? Thank you!

…hetic_snapshot

jimczi · 2024-11-06T13:11:49Z

@dnhatn @martijnvg The PR is now ready for a more thorough review.

The latest changes incorporate the new behavior as an index setting. We’re still discussing whether this should be the default for synthetic sources or just for logsdb, so I haven’t finalized that aspect in the code yet.

When the new setting is enabled (only at index creation), synthetic sources are retrieved in batches using the new LuceneSyntheticSourceChangesSnapshot. The uncompressed source size is recorded during ingestion and used during recovery to limit memory usage when retrieving a batch of synthetic sources. The source size field is then pruned during merges, similar to the recovery source.

The maximum memory size is currently set to 4MB, but we could adjust it based on the available JVM memory on the node if needed.

elasticsearchmachine · 2024-11-06T13:11:50Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

…hetic_snapshot

…t' into lucene_changes_synthetic_snapshot

…hetic_snapshot

…t' into lucene_changes_synthetic_snapshot

jimczi · 2024-12-09T11:30:24Z

@elasticmachine run Elasticsearch Serverless Checks

…hetic_snapshot

… is enabled (elastic#114618) This change adds a new undocumented index settings that allows to use synthetic source for recovery and CCR without storing a recovery source.

The new method with the overloaded chunk size should be used instead. Relates elastic#114618

The new method with the overloaded chunk size should be used instead. Relates #114618

jimczi added 4 commits October 10, 2024 11:59

add benchmark for lucene changes snapshot

eccfa09

enable logsdb mode

a27cea4

iter

a11aa24

Merge remote-tracking branch 'upstream/main' into lucene_changes_synt…

1b0368e

…hetic_snapshot

jimczi added >enhancement :StorageEngine/Logs You know, for Logs labels Oct 11, 2024

elasticsearchmachine added the v9.0.0 label Oct 11, 2024

Update docs/changelog/114618.yaml

76ea21f

jimczi added 2 commits October 11, 2024 15:40

spotless

a3932e9

spotless

d4969eb

martijnvg reviewed Oct 15, 2024

View reviewed changes

remove leftover

9668f5e

jimczi mentioned this pull request Oct 25, 2024

Improve block loader fallback to source when source mode is synthetic. #115394

Open

3 tasks

jimczi added 5 commits October 28, 2024 11:58

Merge remote-tracking branch 'upstream/main' into lucene_changes_synt…

0e02eb4

…hetic_snapshot

Plug the new snapshot when needed and add tests

a96648d

Merge remote-tracking branch 'upstream/main' into lucene_changes_synt…

c810058

…hetic_snapshot

spotless

edd38bb

remove leftover

b99f66f

jimczi marked this pull request as ready for review November 6, 2024 13:11

elasticsearchmachine added the Team:StorageEngine label Nov 6, 2024

jimczi added 4 commits November 6, 2024 15:57

fix NPE

497e0e0

Merge remote-tracking branch 'upstream/main' into lucene_changes_synt…

d6a4326

…hetic_snapshot

spotless

f08c190

another NPE in test

cf5912e

jimczi added 16 commits December 5, 2024 16:17

fix default implementation for the newChangesSnapshot flavour

52c8966

Merge remote-tracking branch 'upstream/main' into lucene_changes_synt…

9a32666

…hetic_snapshot

add missing changes

a3242f5

Merge remote-tracking branch 'upstream/main' into lucene_changes_synt…

874e876

…hetic_snapshot

fix more IT tests now that the chunk size setting is registered

91d399a

Merge branch 'main' into lucene_changes_synthetic_snapshot

5728aa5

Also check index mode when validating the new setting

64f4a51

Merge remote-tracking branch 'upstream/main' into lucene_changes_synt…

0394eff

…hetic_snapshot

Merge remote-tracking branch 'origin/lucene_changes_synthetic_snapsho…

43c25d8

…t' into lucene_changes_synthetic_snapshot

Merge branch 'main' into lucene_changes_synthetic_snapshot

0374b86

Merge branch 'main' into lucene_changes_synthetic_snapshot

df01f6c

Merge remote-tracking branch 'upstream/main' into lucene_changes_synt…

4538a25

…hetic_snapshot

Merge branch 'main' into lucene_changes_synthetic_snapshot

e12b054

Merge remote-tracking branch 'origin/lucene_changes_synthetic_snapsho…

e9f534e

…t' into lucene_changes_synthetic_snapshot

fix default impl for newChangesSnapshot

3d290aa

ensure that we can buffer at least one document

a0fa3fd

jimczi added 2 commits December 9, 2024 11:32

Merge remote-tracking branch 'upstream/main' into lucene_changes_synt…

873e265

…hetic_snapshot

Merge branch 'main' into lucene_changes_synthetic_snapshot

99eaf67

jimczi merged commit d213efd into elastic:main Dec 10, 2024
16 checks passed

jimczi deleted the lucene_changes_synthetic_snapshot branch December 10, 2024 23:05

jimczi added backport pending v8.18.0 labels Dec 10, 2024

jimczi mentioned this pull request Dec 11, 2024

[8.x] Add a new index setting to skip recovery source when synthetic source is enabled #118417

Merged

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Dec 11, 2024

Remove deprecated Engine#newChangesSnapshot

50b190e

The new method with the overloaded chunk size should be used instead. Relates elastic#114618

jimczi mentioned this pull request Dec 11, 2024

Remove deprecated Engine#newChangesSnapshot #118426

Merged

jimczi added a commit that referenced this pull request Dec 11, 2024

Remove deprecated Engine#newChangesSnapshot (#118426)

0700ff7

The new method with the overloaded chunk size should be used instead. Relates #114618

jimczi removed the backport pending label Dec 11, 2024

jimczi mentioned this pull request Dec 11, 2024

Add the USE_SYNTHETIC_SOURCE_FOR_RECOVERY_BACKPORT Index Version #118450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new index setting to skip recovery source when synthetic source is enabled #114618

Add a new index setting to skip recovery source when synthetic source is enabled #114618

jimczi commented Oct 11, 2024 •

edited by martijnvg

Loading

elasticsearchmachine commented Oct 11, 2024

jimczi commented Oct 11, 2024

martijnvg commented Oct 11, 2024

martijnvg left a comment

dnhatn commented Oct 18, 2024

jimczi commented Nov 6, 2024 •

edited

Loading

elasticsearchmachine commented Nov 6, 2024

jimczi commented Dec 9, 2024

Add a new index setting to skip recovery source when synthetic source is enabled #114618

Add a new index setting to skip recovery source when synthetic source is enabled #114618

Conversation

jimczi commented Oct 11, 2024 • edited by martijnvg Loading

Benchmark Overview

Non-Sequential Benchmark Results

Kafka Logs

Endpoint Events Process Logs

Endpoint Events Security Logs

Appendix: Sequential Benchmark Results

Kafka Logs (Sequential)

Endpoint Events Process Logs (Sequential)

Endpoint Events Security Logs (Sequential)

elasticsearchmachine commented Oct 11, 2024

jimczi commented Oct 11, 2024

martijnvg commented Oct 11, 2024

martijnvg left a comment

Choose a reason for hiding this comment

dnhatn commented Oct 18, 2024

jimczi commented Nov 6, 2024 • edited Loading

elasticsearchmachine commented Nov 6, 2024

jimczi commented Dec 9, 2024

jimczi commented Oct 11, 2024 •

edited by martijnvg

Loading

jimczi commented Nov 6, 2024 •

edited

Loading