-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rollup] Managing index lifecycle #33065
Comments
Pinging @elastic/es-search-aggs |
@colings86 and I chatted about this some. Enabling this via ILM is a good long-term goal. With that in mind, ILM would likely invoke this sort of behavior with the Rollover API. From Rollup's point of view, that would look the same as an external Rollover event, so if we enable that behavior we can A) let users do it manually today and B) be ready to work with ILM when it's ready. We'll need to loosen the restriction multi-rollup-index search, but I think that should be easy enough as long as we can guarantee all indices in the query have identical jobs. At rollover time, we have two concerns. The first is the orchestration of start/rollover/stop as described in the OP. The other issues is dealing with
|
I am using the rollup_index and now the rollup_index becomes bigger and bigger.
_rollup_search Api should be able to use alias that contains all the rollup_index end with |
I would love to see the rollup_index pattern like mentioned above. |
Superseded by #48003 We're going to try and integrate Rollup directly into ILM (as an action) rather than trying to get ILM and the current Rollup indexer to "coordinate". Easier for the user, simpler to manage, one configuration, and no need to keep the two tasks synced. And fits nicely into how people naturally want to use Rollups (as part of their index lifecycle and overall retention scheme). |
Today, a Rollup job will store its results in a single rollup index. There is currently no provision for handling jobs that generate such a large volume that they need multiple indices to scale even the rollup data.
There are a couple routes we can take... it's not clear to me what the best is. Current Rollup limitations make it tricky too.
Wait for ILM
Easiest option... wait for ILM (#29823) to be merged and then revisit this conversation. Integrating with ILM somehow will likely provide a better experience instead of baking smaller parts into Rollup.
Support external Rollover
Rollup doesn't play nicely with Rollover today because we try to create the destination rollup index (and if it exists, update the metadata). So if the user points their config at a Rollover alias, we throw an exception.
We could allow Rollup to point at aliases, which I think would let the user manually Rollover indices. There are some tricky bits to this though. Because Rollup uses deterministic IDs for disaster recovery after a checkpoint, the user would have to make sure a checkpoint has been fully committed before rolling over:
It's not terrible, but not super user-friendly either.
Internally support Rollover
We could instead implement the Rollover functionality in Rollup. It'd be essentially the same thing, same procedure, just handled by Rollup. Probably as another config option, and we just check the Rollover criteria when checkpointing or something.
Destination date math/patterns
We could implement something like:
Which would dynamically create destination indices according to the timestamp of the rollup document. Unlike Rollover, we don't have to worry about backtracking and replaying documents because docs will deterministically land in their destination index too.
This does complicate job creation a little bit, since indices are generated on-demand instead of up-front. Meaning we'd need to find a way to enrich those indices with metadata after it is generated dynamically
Big issue related to all approaches
The major problem with all of these approaches is that Rollup doesn't allow more than one rollup index in a RollupSearch API. This was mainly to limit complexity internally rather than a hard limit. And I think the restriction is less important now that
missing_bucket
is implemented.I think we could loosen this restriction as long as all indices involved in the search share the exact same set of rollup jobs, that way we don't have to worry about weird mixed-job incompatibilities.
The text was updated successfully, but these errors were encountered: