-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force Merge: clarify execution and storage requirements #33882
Conversation
Because of the additional space required you I was wondering how the operations are executed: sync or async. Docs did not give me an answer. Using trial and error I found out it is executed synchronously. I thought the information to be useful for others too.
Pinging @elastic/es-core-infra |
hi @frederikbosch , thanks for your contribution! How did you come to the conclusion that we execute the operation sequentially, one index at a time? I had a look at how the force merge API works: the coordinating node resolves the indices included in the request, finds out which shards the operation has to be executed on, then looks at where such shards are allocated and sends one request per node to execute the shard level force-merge operation. Such requests are sent in an async fashion. This is not "sync one index at a time". It does look like on each node, the shard-level force-merge is executed synchronously, one shard at a time. |
@javanna Then my observation was not correct indeed. The indices I executed the operation on were all having one shard. Therefore I assumed that the operation was synchronous per index, while actually it is synchronous per shard. |
@javanna I updated the PR. My concern is the storage increase people have to take into account when running this operation. Since the operation is sync per shard, it occurs to me that it is an increase of approx 100% of the largest shard is what could be expected (assuming read-only indices). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment, thanks again @frederikbosch
@@ -55,7 +55,8 @@ POST /kimchy/_forcemerge?only_expunge_deletes=false&max_num_segments=100&flush=t | |||
=== Multi Index | |||
|
|||
The force merge API can be applied to more than one index with a single call, or | |||
even on `_all` the indices. | |||
even on `_all` the indices. Multi index operations are executed async for all nodes, | |||
but one shard at a time per node. This will cause storage to increase per node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the async part please, which is generally how Elasticsearch works? Also, I think it is valuable to add info about storage requirements, but if we do so we should be more specific. The worst case is when force_merge is executed with max_num_segments set to 1, in which case we should clarify that the storage temporarily goes up to potentially 100% of the size of the shard being merged. That means that running each shard per node sequentially is good when looking at storage requirements ;)
hi @frederikbosch let me know if you need help with this, if you don't have time to make the requested changes I can also make them so I can merge your PR. Let me know what you prefer ;) |
@javanna Yes sorry. At this moment I think I cannot give follow-up to the PR. If you could finish it, please go ahead. I think you are also better in formulating the exact behaviour. |
thanks @frederikbosch ! |
Thank you! |
* master: (24 commits) ingest: better support for conditionals with simulate?verbose (elastic#34155) [Rollup] Job deletion should be invoked on the allocated task (elastic#34574) [DOCS] .Security index is never auto created (elastic#34589) CCR: Requires soft-deletes on the follower (elastic#34725) re-enable bwc tests (elastic#34743) Empty GetAliases authorization fix (elastic#34444) INGEST: Document Processor Conditional (elastic#33388) [CCR] Add total fetch time leader stat (elastic#34577) SQL: Support pattern against compatible indices (elastic#34718) [CCR] Auto follow pattern APIs adjustments (elastic#34518) [Test] Remove dead code from ExceptionSerializationTests (elastic#34713) A small typo in migration-assistance doc (elastic#34704) ingest: processor stats (elastic#34724) SQL: Implement IN(value1, value2, ...) expression. (elastic#34581) Tests: Add checks to GeoDistanceQueryBuilderTests (elastic#34273) INGEST: Rename Pipeline Processor Param. (elastic#34733) Core: Move IndexNameExpressionResolver to java time (elastic#34507) [DOCS] Force Merge: clarify execution and storage requirements (elastic#33882) TESTING.asciidoc fix examples using forbidden annotation (elastic#34515) SQL: Implement `CONVERT`, an alternative to `CAST` (elastic#34660) ...
Because of the additional space required you I was wondering how operations are executed: sync or async. Docs did not give me an answer. Using trial and error I found out it is executed synchronously. I thought the information to be useful for others too.