Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an ILM action to rollup an index #48003

Closed
csoulios opened this issue Oct 14, 2019 · 14 comments · Fixed by #65633
Closed

Add an ILM action to rollup an index #48003

csoulios opened this issue Oct 14, 2019 · 14 comments · Fixed by #65633
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >feature :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Data Management Meta label for data/management team

Comments

@csoulios
Copy link
Contributor

Rollup should no longer be a continuous running job. Instead, it should be an action that can be triggered on ILM-managed indices. The action should iterate over all grouping tuples (dimensions) and calculate aggregate metrics (min/max/avg/value_count) generating one document per grouping tuple (dimension).

In the context of this issue we should implement the following:

  • Implement a "grouping tuple" field type to generate the tuple at index time for the document.
  • Implement a "rollup metric" field type to store aggregation results (min/max/count/avg etc). This metric should provide the correct information to requesting aggregator (e.g. avg agg on rollup_metric will internally fetch sum + count)
  • Store document count somewhere on the document (docvalue field?)
  • Create an ILM rollup action
@csoulios csoulios added :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data >feature labels Oct 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Rollup)

@dakrone dakrone added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Oct 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@TommySedin
Copy link

Are there any news on this? We're having a soon-to-be situation with metrics indices needing to be rolled up and I'd want to know if I should build something based on the Rollup mechanic or wait for this. Or do you believe the ILM action will be "backwards compatible" with previous rollup indices?

@cmklar
Copy link

cmklar commented Feb 23, 2020

waiting for this enhancement, Thanks!

@polyfractal
Copy link
Contributor

@TommySedin it's still very early in the process, so I can't really give any concrete details about how it will end up looking. We're working on the search side of things first, trying to remove the _rollup_search end point before tackling the ILM portion mentioned in this ticket. So right now the ILM side is still a bit vague in terms of implementation.

We definitely want some kind of migration path though. It's not clear how that will turn out: could be that we add logic so that both "old" and "new" rollup indices are handled internally by the search end point, or some way to migrate/reindex data to the new format that the ILM action uses, etc. But our goal is to not leave any currently rolled-up data "left behind"

That's about all I have unfortunately, like I said it's still early in the process so we don't have a lot firm details yet.

@mcascallares
Copy link

Will this feature allow for rollup jobs modification? Imagine the scenario where new fields appear in the indices, would it be possible to dynamically add new fields to the existing rollups without the need for recreating the job and losing the historical values?

Thanks

@giladgal
Copy link
Contributor

Our tentative plan is to have rollup as an action in ILM. The rollup configuration will stay pretty much the same, but it will be part of the ILM action. This means to update the rollup, one will only need to update the rollup configuration in the ILM policy, and the next time the ILM policy is executed it performs the rollup with the new parameters, e.g. the new fields.

A future enhancement could be to automatically add fields that are being added to the index, e.g. based on a wildcard or some similar rule.

This is the plan, but keep in mind that we are in the design phase, so the implementation may be different.
@mcascallares

@rjernst rjernst added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Data Management Meta label for data/management team labels May 4, 2020
@simone-smith
Copy link

Will there be an ILM phase for deleting old data in a rollup index? We would like to have two years worth of data in a rollup index but don't need to keep it for any longer than that. Do you have any recommendations for making sure that the index size doesn't grow indefinitely?

@JathinSanghvi
Copy link

Currently i was able to setup a template that matches the rollup index and set a ilm policy via the template. this ilm policy is for the rollup indices and have steps similar to normal policy, this way i was able to set a roll over the rollup index and also set up a delete phase for it.

@michaelpietzsch
Copy link

Currently i was able to setup a template that matches the rollup index and set a ilm policy via the template. this ilm policy is for the rollup indices and have steps similar to normal policy, this way i was able to set a roll over the rollup index and also set up a delete phase for it.

Just out of intrest, what version of Elasticsearch are you running?

@peterdkdp
Copy link

Currently i was able to setup a template that matches the rollup index and set a ilm policy via the template. this ilm policy is for the rollup indices and have steps similar to normal policy, this way i was able to set a roll over the rollup index and also set up a delete phase for it.

Just out of intrest, what version of Elasticsearch are you running?

Currently upgrading to 7.9.2

@michaelpietzsch
Copy link

@peterdkdp do you have it running aswell? Just to make sure .

  • Create Index Livecycle Policy
  • Create Index Template to bind ILM Policy (Possibly with rollover)
  • Create Index ... automated via Rollup Job.

?

My main goal is to split indexes in size to keep them usable on a long term perspective.

@peterdkdp
Copy link

@michaelpietzsch my apologies, I was replying to your reply. I thought you got it working.
I'm also still waiting on the implementation of rollups with ILM.
Let's check with @JathinSanghvi if he has a procedure on how he did it.

@michaelpietzsch
Copy link

In the meantime ive created something curator based

@talevy talevy self-assigned this Nov 24, 2020
talevy added a commit to talevy/elasticsearch that referenced this issue Dec 8, 2020
this commit introduces a new Rollup ILM Action that allows indices
to be rolled up according to a specific rollup config. The
action also allows for the new rolled up index to be associated with
a different policy than the original/source index. Optionally,
the original index can be deleted.

Relates elastic#42720.

Closes elastic#48003.
talevy added a commit that referenced this issue Jan 29, 2021
this commit introduces a new Rollup ILM Action that allows indices
to be rolled up according to a specific rollup config. The
action also allows for the new rolled up index to be associated with
a different policy than the original/source index.

Relates #42720.

Closes #48003.
talevy added a commit to talevy/elasticsearch that referenced this issue Jan 29, 2021
this commit introduces a new Rollup ILM Action that allows indices
to be rolled up according to a specific rollup config. The
action also allows for the new rolled up index to be associated with
a different policy than the original/source index.

Relates elastic#42720.

Closes elastic#48003.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >feature :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Data Management Meta label for data/management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.