-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaitForSnapshotStep
verifies if the index belongs to the latest snapshot of that SLM policy
#100911
WaitForSnapshotStep
verifies if the index belongs to the latest snapshot of that SLM policy
#100911
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @gmarouli, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code mostly looks very good. I've added a couple of admittedly trivial comments.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/WaitForSnapshotStep.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/WaitForSnapshotStep.java
Outdated
Show resolved
Hide resolved
Thank you for review @joegallo, you keep me sharp 🤓 . |
Hi @gmarouli, I've updated the changelog YAML for you. |
Hi @gmarouli, I've updated the changelog YAML for you. |
About the backport labels, I think we should try to backport it, I do realise that backporting to |
Regarding backporting, I think it should be pretty straightforward (hopefully!), a lot of the ILM code has been pretty stable for a while. A tricky bit is that there's no generally available edit: or... looking at how it's used in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
@elasticmachine update branch |
💔 Backport failed
You can use sqren/backport to manually backport by running |
…pshot of that SLM policy (elastic#100911) The `WaitForSnapshotStep` used to check if the SLM policy has been executed after the index has entered the delete phase, but it did not check if the SLM policy included this index. The result of this is that if the user used an SLM policy that did not include this index, when the index would enter the `WaitForSnapshotStep`, it would wait for a snapshot to be taken, a snapshot that would not include the index, and then ILM would delete the index. See the exact reproduction path: elastic#57809 **Solution** This PR, after it finds a successful SLM run, it verifies if the snapshot taken by SLM contains this index. If not it throws an error, otherwise it proceeds. ILM explain will report: ``` "step_info": { "type": "illegal_state_exception", "reason": "the last successful snapshot of policy 'hourly-snapshots' does not include index '.ds-my-other-stream-2023.10.16-000001'" } ``` **Backwards compatibility concerns** In this PR, the `WaitForSnapshotStep` changed from `ClusterStateWaitStep` to `AsyncWaitStep`. We do not think this is gonna cause an issue. This was tested manually by the following steps: - Run a master node with the old version. - When ILM is executing `wait-for-snapshot`, we shutdown the node - We start the node again with the new version os ES - ES was able to pick up the step and continue with the new code. We believe that this covers bwc concerns. Fixes: elastic#57809
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…pshot of that SLM policy (elastic#100911) The `WaitForSnapshotStep` used to check if the SLM policy has been executed after the index has entered the delete phase, but it did not check if the SLM policy included this index. The result of this is that if the user used an SLM policy that did not include this index, when the index would enter the `WaitForSnapshotStep`, it would wait for a snapshot to be taken, a snapshot that would not include the index, and then ILM would delete the index. See the exact reproduction path: elastic#57809 **Solution** This PR, after it finds a successful SLM run, it verifies if the snapshot taken by SLM contains this index. If not it throws an error, otherwise it proceeds. ILM explain will report: ``` "step_info": { "type": "illegal_state_exception", "reason": "the last successful snapshot of policy 'hourly-snapshots' does not include index '.ds-my-other-stream-2023.10.16-000001'" } ``` **Backwards compatibility concerns** In this PR, the `WaitForSnapshotStep` changed from `ClusterStateWaitStep` to `AsyncWaitStep`. We do not think this is gonna cause an issue. This was tested manually by the following steps: - Run a master node with the old version. - When ILM is executing `wait-for-snapshot`, we shutdown the node - We start the node again with the new version os ES - ES was able to pick up the step and continue with the new code. We believe that this covers bwc concerns. Fixes: elastic#57809 (cherry picked from commit 5697fcf) # Conflicts: # x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/WaitForSnapshotStepTests.java
…pshot of that SLM policy (#100911) (#101027) The `WaitForSnapshotStep` used to check if the SLM policy has been executed after the index has entered the delete phase, but it did not check if the SLM policy included this index. The result of this is that if the user used an SLM policy that did not include this index, when the index would enter the `WaitForSnapshotStep`, it would wait for a snapshot to be taken, a snapshot that would not include the index, and then ILM would delete the index. See the exact reproduction path: #57809 **Solution** This PR, after it finds a successful SLM run, it verifies if the snapshot taken by SLM contains this index. If not it throws an error, otherwise it proceeds. ILM explain will report: ``` "step_info": { "type": "illegal_state_exception", "reason": "the last successful snapshot of policy 'hourly-snapshots' does not include index '.ds-my-other-stream-2023.10.16-000001'" } ``` **Backwards compatibility concerns** In this PR, the `WaitForSnapshotStep` changed from `ClusterStateWaitStep` to `AsyncWaitStep`. We do not think this is gonna cause an issue. This was tested manually by the following steps: - Run a master node with the old version. - When ILM is executing `wait-for-snapshot`, we shutdown the node - We start the node again with the new version os ES - ES was able to pick up the step and continue with the new code. We believe that this covers bwc concerns. Fixes: #57809
…est snapshot of that SLM policy (#100911) (#101030) * `WaitForSnapshotStep` verifies if the index belongs to the latest snapshot of that SLM policy (#100911) The `WaitForSnapshotStep` used to check if the SLM policy has been executed after the index has entered the delete phase, but it did not check if the SLM policy included this index. The result of this is that if the user used an SLM policy that did not include this index, when the index would enter the `WaitForSnapshotStep`, it would wait for a snapshot to be taken, a snapshot that would not include the index, and then ILM would delete the index. See the exact reproduction path: #57809 **Solution** This PR, after it finds a successful SLM run, it verifies if the snapshot taken by SLM contains this index. If not it throws an error, otherwise it proceeds. ILM explain will report: ``` "step_info": { "type": "illegal_state_exception", "reason": "the last successful snapshot of policy 'hourly-snapshots' does not include index '.ds-my-other-stream-2023.10.16-000001'" } ``` **Backwards compatibility concerns** In this PR, the `WaitForSnapshotStep` changed from `ClusterStateWaitStep` to `AsyncWaitStep`. We do not think this is gonna cause an issue. This was tested manually by the following steps: - Run a master node with the old version. - When ILM is executing `wait-for-snapshot`, we shutdown the node - We start the node again with the new version os ES - ES was able to pick up the step and continue with the new code. We believe that this covers bwc concerns. Fixes: #57809 (cherry picked from commit 5697fcf)
The
WaitForSnapshotStep
used to check if the SLM policy has been executed after the index has entered the delete phase, but it did not check if the SLM policy included this index.The result of this is that if the user used an SLM policy that did not include this index, when the index would enter the
WaitForSnapshotStep
, it would wait for a snapshot to be taken, a snapshot that would not include the index, and then ILM would delete the index.See the exact reproduction path: #57809
Solution
This PR, after it finds a successful SLM run, it verifies if the snapshot taken by SLM contains this index. If not it throws an error, otherwise it proceeds.
ILM explain will report:
Backwards compatibility concerns
In this PR, the
WaitForSnapshotStep
changed fromClusterStateWaitStep
toAsyncWaitStep
. We do not think this is gonna cause an issue. This was tested manually by the following steps:wait-for-snapshot
, we shutdown the nodeWe believe that this covers bwc concerns.
Fixes: #57809