Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid making the index read-only in the Force merge action for ILM (Closes #43426) #81162

Merged
merged 46 commits into from
Jun 9, 2022
Merged

Avoid making the index read-only in the Force merge action for ILM (Closes #43426) #81162

merged 46 commits into from
Jun 9, 2022

Conversation

jcbages
Copy link
Contributor

@jcbages jcbages commented Nov 30, 2021

  • Added a new NoopStep that we can use as a placeholder for a step that we had in a previous version but now we want to remove it. If the just go and erase the step then the indices that were in that step will get into a stuck state. Instead, we're using the same old step key but with this NoopStep which does nothing but the transition to the next step.

  • Modify the ForceMergeLifecycleAction in ILM so it doesn't make the index read-only. For older indices already in the "convert index to read-only" step before the upgrade, we'll use the NoopStep mentioned before in order to ignore the step and just transition to the next one.

Solves: #43426

@elasticsearchmachine elasticsearchmachine added v8.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 30, 2021
@jbaiera jbaiera added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label Nov 30, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Nov 30, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@jakelandis
Copy link
Contributor

@elasticmachine OK to test

@dakrone
Copy link
Member

dakrone commented Nov 30, 2021

This is slightly different than what we discussed in #53289 (comment), which is that we would remove making the index read-only automatically, leaving it up to the user to add the readonly action if they want an index to be read-only.

@jcbages
Copy link
Contributor Author

jcbages commented Nov 30, 2021

Oh I see, then the idea is that force merge doesn't add readonly at all but instead people can add the readonly from the ILM hot/warm phase directly right? I can change the PR to reflect that!

@dakrone
Copy link
Member

dakrone commented Dec 1, 2021

Oh I see, then the idea is that force merge doesn't add readonly at all but instead people can add the readonly from the ILM hot/warm phase directly right? I can change the PR to reflect that!

Yep, that's correct!

@jcbages
Copy link
Contributor Author

jcbages commented Dec 1, 2021

@dakrone I made the corresponding changes and the checks seem good. Let me know what you think and I was wondering what's the earliest version in which we could get this, thanks!

@dakrone
Copy link
Member

dakrone commented Dec 1, 2021

@elasticmachine update branch

@dakrone
Copy link
Member

dakrone commented Dec 1, 2021

@elasticmachine ok to test

@dakrone
Copy link
Member

dakrone commented Dec 2, 2021

Sorry for the delay on this, I've been doing some manual testing and an issue that I suspected we might run into does indeed happen to be the case.

When we remove a step from an action (forcemerge in this case), a policy that happens to be on that step will enter an unrecoverable error when that index executes on the version where that step doesn't exist. I tested this locally by moving the index into the readonly step inside of forcemerge, and then upgrading to your branch where the readonly step no longer exists. The index spins in the non-error state (it just never progresses) with:

[2021-12-01T17:18:50,974][TRACE][o.e.x.i.IndexLifecycleService] [runTask-0] job triggered: ilm, 1638404330973, 1638404330974
[2021-12-01T17:18:50,974][TRACE][o.e.x.i.IndexLifecycleRunner] [runTask-0] [myindex-000001] retrieved current step key: {"phase":"warm","action":"forcemerge","name":"readonly"}
[2021-12-01T17:18:50,974][TRACE][o.e.x.i.PolicyStepsRegistry] [runTask-0] parsed steps for policy [fmpolicy] in phase [warm], definition: [{"policy":"fmpolicy","phase_definition":{"min_age":"0ms","actions":{"forcemerge":{"max_num_segments":1}}},"version":1,"modified_date_in_millis":1638395616594}], steps: [[{"phase":"warm","action":"migrate","name":"branch-check-skip-action"} => null, {"phase":"warm","action":"migrate","name":"migrate"} => {"phase":"warm","action":"migrate","name":"check-migration"}, {"phase":"warm","action":"migrate","name":"check-migration"} => {"phase":"warm","action":"forcemerge","name":"branch-forcemerge-check-prerequisites"}, {"phase":"warm","action":"forcemerge","name":"branch-forcemerge-check-prerequisites"} => null, {"phase":"warm","action":"forcemerge","name":"check-not-write-index"} => {"phase":"warm","action":"forcemerge","name":"forcemerge"}, {"phase":"warm","action":"forcemerge","name":"forcemerge"} => {"phase":"warm","action":"forcemerge","name":"segment-count"}, {"phase":"warm","action":"forcemerge","name":"segment-count"} => {"phase":"warm","action":"complete","name":"complete"}, {"phase":"warm","action":"complete","name":"complete"} => {"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"}]]
[2021-12-01T17:18:50,975][ERROR][o.e.x.i.IndexLifecycleRunner] [runTask-0] current step [{"phase":"warm","action":"forcemerge","name":"readonly"}] for index [myindex-000001] with policy [fmpolicy] is not recognized

I'd like to figure out how we should handle things like this, because we can easily fall into this when we make subtractive changes (additive changes are no problem) to the steps executed for an action.
/cc @andreidan @joegallo perhaps you have some ideas for a way we can work around this, I think it'd be good to have a generic way to handle "current step ... is not recognized".

@jcbages
Copy link
Contributor Author

jcbages commented Dec 2, 2021

@dakrone oh wow great catch! One idea that I can think of is to add the readonly step in the forcemerge action but not have any other step "point" to it (so the check-index-write and the readonly step would point to the same next step). Maybe some annotation or log of "deprecated" step could be added and then in let's say version 9 the step can be fully removed 🤔, wdyt?

@dakrone
Copy link
Member

dakrone commented Dec 6, 2021

One thing we could think about is substituting it for a "noop" step with the same stepkey as the previous step, this would allow us to remove its action, but still have it "execute" by doing nothing for those particular steps that were removed. I'll experiment a little bit with this. I think combining it with the un-referenced idea you have @jcbages might work also.

@Jamalarm
Copy link

Hi, is this still in progress? We are very interested in this change as the enforced read-only prevents us from using ILM for our use case.

@jcbages
Copy link
Contributor Author

jcbages commented Jan 25, 2022

Hi @dakrone, wondering if you were able to test the idea you mention or if you want me to try it and report results so we can make the required changes for merging this

@dakrone
Copy link
Member

dakrone commented Jan 28, 2022

Hey @jcbages, sorry for the delay on this, thanks for being patient! I spent some time brainstorming with @andreidan about how to fix this. I think we can do it, but instead of removing the step, we'll keep the step (and its stepkey) in the ForceMergeAction, but instead of an UpdateSettingsStep, we can introduce a new NoopStep to replace it (with the same step key) that doesn't actually do anything. We can then make this step available for older indices and skipped for new ones.

Does that make sense? Is this something you're still interested in working on, or would you prefer that we do it?

@jcbages
Copy link
Contributor Author

jcbages commented Jan 31, 2022

Hi @dakrone, thanks for your answer. Yeah I think I understand the idea and I'm still very interested in working on so I'll follow up with some commits that should address this soon.

@mark-vieira mark-vieira added v8.2.0 and removed v8.1.0 labels Feb 2, 2022
@jcbages jcbages changed the title Allow option to disable readonly step in Force merge action for ILM (Closes #43426) Avoid making the index read-only in the Force merge action for ILM (Closes #43426) Jun 5, 2022
@jcbages
Copy link
Contributor Author

jcbages commented Jun 6, 2022

Hi @dakrone, I was just able to resume this and I added the changes we discussed regarding the no-op step with the read-only key. Let me know how it looks to you and if you could reproduce the case in which the index got stuck to validate if this is not happening anymore it'd be great!

@dakrone dakrone self-requested a review June 6, 2022 15:17
jcbages and others added 24 commits June 9, 2022 11:07
…ility whenever we change the functionality of a step
@jcbages jcbages requested a review from dakrone June 9, 2022 17:38
@jcbages
Copy link
Contributor Author

jcbages commented Jun 9, 2022

Hi @dakrone thanks for the comments! I already addressed them and tested the ILM is not getting stuck in readonly step. Let me know if you think there's something else to change and thanks for all your help!

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for iterating on this @jcbages, I left one more comment, but then I will merge this in.

* It literally does nothing so that we can safely proceed to the nextStepKey without getting stuck.
*/
public class NoopStep extends ClusterStateWaitStep {
public static final String NAME = "NOOP";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of our step names are lowercased

Suggested change
public static final String NAME = "NOOP";
public static final String NAME = "noop";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dakrone thanks! I applied to suggestion so now the NAME is in lowercase

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
@jcbages
Copy link
Contributor Author

jcbages commented Jun 9, 2022

Hi @dakrone, I made the change for your last comment and we got a clean build so we should be good to merge!

@dakrone dakrone merged commit 4694398 into elastic:master Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.