Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance shrink action in ILM to specify max single primary shard size #65714

Closed
gaobinlong opened this issue Dec 2, 2020 · 7 comments
Closed
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >feature Team:Data Management Meta label for data/management team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@gaobinlong
Copy link
Contributor

Elasticsearch version: master

Plugins installed: []

JVM version (java -version): 14.0.1

OS version (uname -a if on a Unix-like system): Mac OS 10.13.6

Description of the problem including expected versus actual behavior:

Today in the shrink action of ILM, we can only set number_of_shards to a certain value which is a factor of the number of shards in the source index.
However, in our situation, the storage of the source indexs differ from 100GB to 1TB, all of the source indexs contain 60 shards, we want to shrink the soure index according to it's storage, for example, shrink the 100GB index to 2 shards and the 1TB index to 20 shards, ensure that the max primary shard size is 50GB.
So can we add a parameter like max_single_primary_size in the shrink action thus the shard num of the new shrunken index can be calculated by the storage of the source index?

@tlrx tlrx added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. team-discuss labels Dec 3, 2020
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 3, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@tlrx
Copy link
Member

tlrx commented Dec 14, 2020

We discussed in team and we agreed that the easy path would be to use datastreams with automated max_size rollover using ILM. But we also agreed that it is not always possible to use datastreams and/or timestamp indices, specially if out-of-order documents are indexed or updated, and as such we think that the feature you are proposing could be useful to Elasticsearch users.

The existing resize action which is used for shrinking or splitting indices already uses the index stats before taking an action. For example, before shrinking an index it verifies if the resulting index will exceed the maximum number of documents per shard. I assume that we could use the index stats to compute an appropriate number of shards that would fit under a maximum limit set by a request parameter.

Would you be interested in working on this?

@gaobinlong
Copy link
Contributor Author

@tlrx, thanks for your reply, I'm very glad to work on this feature. We have to use a python script to shrink index according to it's storage now, so I think the feature is meaningful if we implement it in ILM.

@tlrx
Copy link
Member

tlrx commented Dec 17, 2020

so I think the feature is meaningful if we implement it in ILM.

I think the feature itself can be implemented in the resize action itself and exposed in ILM.

@jakelandis
Copy link
Contributor

jakelandis commented Dec 17, 2020

This is somewhat related to #63026 (except that issue is for rollover) as well as #63872.

+1 to general direction.

please cc @elastic/es-core-features for any reviews.

@joegallo
Copy link
Contributor

joegallo commented May 4, 2021

@gaobinlong is there anything else for this issue? It looks to me like #67705 and some of the follow ups should have it covered, right?

@gaobinlong
Copy link
Contributor Author

Yes, this issue can be closed now.

@dakrone dakrone closed this as completed May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >feature Team:Data Management Meta label for data/management team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

6 participants