-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SO migration documentation: release write block on source index in case of corrupt document failure #100631
Comments
Pinging @elastic/kibana-core (Team:Core) |
I think we need to be careful to only release the write block in scenarios where we know that other nodes are also going to fail the migration. If we release it on any failure, then we can introduce data loss. For example:
Another example besides a shard failure where this could happen is in the scenario where some nodes have different SO types registered than others. The nodes that have all the SO types registered will continue the migration, however the other nodes may give up. In order to do this safely, we really can only release the write block if we know for sure that all other nodes are also going to fail the migration. I'm not sure there's a scenario where we can guarantee this though? |
We discussed this sync during sprint planning yesterday and decided that we indeed cannot safely remove the write block due issues like the one listed above. This issue is now scoped to only update the documentation on how to handle the corrupt document case to include a step for removing the write block. These docs can be found here: https://www.elastic.co/guide/en/kibana/master/upgrade-migrations.html#_corrupt_saved_objects |
atm, when a failure occurs during or after the client-side reindex from the source to temp, the write block on the source that was enabled during the
SET_SOURCE_WRITE_BLOCK
step is not released.When the failure was caused by corrupted of invalid SO documents, this adds a necessary step during the manual intervention, as the block needs to be manually released before trying to fix or remove the faulty document(s). We need to update our documentation on how to remedy this situation: https://www.elastic.co/guide/en/kibana/master/upgrade-migrations.html#_corrupt_saved_objects
Original issue content:
atm, when a failure occurs during or after the client-side reindex from the source to temp, the write block on the source that was enabled during the
SET_SOURCE_WRITE_BLOCK
step is not released.When the failure was caused by corrupted of invalid SO documents, this adds an unnecessary step during the manual intervention, as the block needs to be manually released before trying to fix or remove the faulty document(s).
In case of migration failure, we should ensure that the indices are back to their initial state by removing the write block during the cleanup step.
Note: we are also enabling write block on
The text was updated successfully, but these errors were encountered: