Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force Merge: clarify execution and storage requirements #33882

Merged
merged 4 commits into from
Oct 23, 2018

Conversation

frederikbosch
Copy link
Contributor

@frederikbosch frederikbosch commented Sep 20, 2018

Because of the additional space required you I was wondering how operations are executed: sync or async. Docs did not give me an answer. Using trial and error I found out it is executed synchronously. I thought the information to be useful for others too.

Because of the additional space required you I was wondering how the operations are executed: sync or async. Docs did not give me an answer. Using trial and error I found out it is executed synchronously. I thought the information to be useful for others too.
@frederikbosch frederikbosch changed the title Multi index operations are executed synchronously. Force Merge: multi index operations are executed synchronously. Sep 20, 2018
@astefan astefan added the :Data Management/Indices APIs APIs to create and manage indices and templates label Sep 21, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@astefan astefan added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Data Management/Indices APIs APIs to create and manage indices and templates labels Sep 21, 2018
@javanna
Copy link
Member

javanna commented Oct 2, 2018

hi @frederikbosch , thanks for your contribution!

How did you come to the conclusion that we execute the operation sequentially, one index at a time?

I had a look at how the force merge API works: the coordinating node resolves the indices included in the request, finds out which shards the operation has to be executed on, then looks at where such shards are allocated and sends one request per node to execute the shard level force-merge operation. Such requests are sent in an async fashion. This is not "sync one index at a time". It does look like on each node, the shard-level force-merge is executed synchronously, one shard at a time.

@frederikbosch
Copy link
Contributor Author

@javanna Then my observation was not correct indeed. The indices I executed the operation on were all having one shard. Therefore I assumed that the operation was synchronous per index, while actually it is synchronous per shard.

@frederikbosch
Copy link
Contributor Author

@javanna I updated the PR. My concern is the storage increase people have to take into account when running this operation. Since the operation is sync per shard, it occurs to me that it is an increase of approx 100% of the largest shard is what could be expected (assuming read-only indices).

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment, thanks again @frederikbosch

@@ -55,7 +55,8 @@ POST /kimchy/_forcemerge?only_expunge_deletes=false&max_num_segments=100&flush=t
=== Multi Index

The force merge API can be applied to more than one index with a single call, or
even on `_all` the indices.
even on `_all` the indices. Multi index operations are executed async for all nodes,
but one shard at a time per node. This will cause storage to increase per node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the async part please, which is generally how Elasticsearch works? Also, I think it is valuable to add info about storage requirements, but if we do so we should be more specific. The worst case is when force_merge is executed with max_num_segments set to 1, in which case we should clarify that the storage temporarily goes up to potentially 100% of the size of the shard being merged. That means that running each shard per node sequentially is good when looking at storage requirements ;)

@javanna
Copy link
Member

javanna commented Oct 23, 2018

hi @frederikbosch let me know if you need help with this, if you don't have time to make the requested changes I can also make them so I can merge your PR. Let me know what you prefer ;)

@frederikbosch
Copy link
Contributor Author

@javanna Yes sorry. At this moment I think I cannot give follow-up to the PR. If you could finish it, please go ahead. I think you are also better in formulating the exact behaviour.

@javanna javanna added >docs General docs changes v7.0.0 v6.5.0 labels Oct 23, 2018
@javanna javanna changed the title Force Merge: multi index operations are executed synchronously. Force Merge: clarify execution and storage requirements Oct 23, 2018
@javanna javanna merged commit 183c32d into elastic:master Oct 23, 2018
@javanna
Copy link
Member

javanna commented Oct 23, 2018

thanks @frederikbosch !

@frederikbosch
Copy link
Contributor Author

Thank you!

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 23, 2018
* master: (24 commits)
  ingest: better support for conditionals with simulate?verbose (elastic#34155)
  [Rollup] Job deletion should be invoked on the allocated task (elastic#34574)
  [DOCS] .Security index is never auto created (elastic#34589)
  CCR: Requires soft-deletes on the follower (elastic#34725)
  re-enable bwc tests (elastic#34743)
  Empty GetAliases authorization fix (elastic#34444)
  INGEST: Document Processor Conditional (elastic#33388)
  [CCR] Add total fetch time leader stat (elastic#34577)
  SQL: Support pattern against compatible indices (elastic#34718)
  [CCR] Auto follow pattern APIs adjustments (elastic#34518)
  [Test] Remove dead code from ExceptionSerializationTests (elastic#34713)
  A small typo in migration-assistance doc (elastic#34704)
  ingest: processor stats (elastic#34724)
  SQL: Implement IN(value1, value2, ...) expression. (elastic#34581)
  Tests: Add checks to GeoDistanceQueryBuilderTests (elastic#34273)
  INGEST: Rename Pipeline Processor Param. (elastic#34733)
  Core: Move IndexNameExpressionResolver to java time (elastic#34507)
  [DOCS] Force Merge: clarify execution and storage requirements (elastic#33882)
  TESTING.asciidoc fix examples using forbidden annotation (elastic#34515)
  SQL: Implement `CONVERT`, an alternative to `CAST` (elastic#34660)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >docs General docs changes v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants