Ignore metadata of deleted indices at start #48918

DaveCTurner · 2019-11-09T00:26:31Z

Today in 6.x it is possible to add an index tombstone to the graveyard without
deleting the corresponding index metadata, because the deletion is slightly
deferred. If you shut down the node and upgrade to 7.x when in this state then
the node will fail to apply any cluster states, reporting

java.lang.IllegalStateException: Cannot delete index [...], it is still part of the cluster state.

This commit addresses this situation by skipping over any index metadata with a
corresponding tombstone, allowing this metadata to be cleaned up by the 7.x
node.

Today in 6.x it is possible to add an index tombstone to the graveyard without deleting the corresponding index metadata, because the deletion is slightly deferred. If you shut down the node and upgrade to 7.x when in this state then the node will fail to apply any cluster states, reporting java.lang.IllegalStateException: Cannot delete index [...], it is still part of the cluster state. This commit addresses this situation by skipping over any index metadata with a corresponding tombstone, allowing this metadata to be cleaned up by the 7.x node.

elasticmachine · 2019-11-09T00:26:34Z

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

DaveCTurner · 2019-11-09T00:26:51Z

Relates https://discuss.elastic.co/t/upgrade-from-6-8-3-to-7-x-x-results-in-failed-to-apply-updated-cluster-state/207111/3

DaveCTurner · 2019-11-10T14:49:20Z

This needs more work because it doesn't address the case where you're in this state in a rolling upgrade.

DaveCTurner · 2019-11-11T15:17:01Z

This needs more work because it doesn't address the case where you're in this state in a rolling upgrade.

Discussed this and decided it's ok for a rolling upgrade to fall back on a full cluster restart if it happens to be in this state, and this PR will allow the full cluster restart to proceed.

andrershov

Looks good, I've left one request to rename the function name

andrershov · 2019-11-11T15:21:10Z

server/src/test/java/org/elasticsearch/gateway/GatewayIndexStateIT.java

+
+        final MetaData metaData = internalCluster().getInstance(ClusterService.class).state().metaData();
+        final Path[] paths = internalCluster().getInstance(NodeEnvironment.class).nodeDataPaths();
+        writeBrokenMeta(metaStateService -> {


I know this is not your change, but the name "writeBrokenMeta" looks invalid for two reasons - in this test we write well-formed metadata, this method performs full-cluster restart and I think this should be reflected in the method name.

A more substantial change to this method (including fixing its name) is incoming in https://github.com/elastic/elasticsearch/pull/48733/files#diff-a53ee618ca95b1bde55d7f5508a03d6aR511.

However the metadata written here is well-formed but still broken - it contains a tombstone for an index that is not properly deleted.

andrershov

LGTM

Today in 6.x it is possible to add an index tombstone to the graveyard without deleting the corresponding index metadata, because the deletion is slightly deferred. If you shut down the node and upgrade to 7.x when in this state then the node will fail to apply any cluster states, reporting java.lang.IllegalStateException: Cannot delete index [...], it is still part of the cluster state. This commit addresses this situation by skipping over any index metadata with a corresponding tombstone, allowing this metadata to be cleaned up by the 7.x node.

dpeddi · 2021-01-28T07:04:42Z

I got this error while upgrading from 6.7 to 7.0.1.
After reading this I tried another upgrade upgrade to 7.10.2 but the error is still present.

Just for other users/readers, I was able to restart my cluster by manually removing all the state with the latest lucene/luke after reading the "Cannot delete index [...], it is still part of the cluster state." message for each node start

DaveCTurner · 2021-01-28T08:30:20Z

Just for other users/readers, I was able to restart my cluster by manually removing all the state with the latest lucene/luke

For the sake of other users/readers, this advice is extremely dangerous and we do not recommend following it. Editing the contents of the data path using a tool like Luke can result in arbitrary and silent data loss.

@dpeddi the actual solution is not to upgrade to 7.0.1, this version is already long past EOL. The current recommendation is to upgrade to the latest 6.8 and then the latest 7.x (7.10.2 at time of writing).

DaveCTurner added >bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.5.0 v7.6.0 labels Nov 9, 2019

DaveCTurner requested review from andrershov and ywelsch November 9, 2019 00:26

Checkstyle

d7b35ba

Merge branch 'master' into 2019-11-08-half-deleted-index-import

cfd66a5

andrershov reviewed Nov 11, 2019

View reviewed changes

DaveCTurner requested a review from andrershov November 11, 2019 15:39

andrershov approved these changes Nov 11, 2019

View reviewed changes

jimczi removed the v7.5.0 label Nov 12, 2019

DaveCTurner merged commit 13170a7 into elastic:master Nov 12, 2019

DaveCTurner deleted the 2019-11-08-half-deleted-index-import branch November 12, 2019 11:02

DaveCTurner added the backport pending label Nov 12, 2019

DaveCTurner added v7.5.0 and removed backport pending labels Nov 12, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

DaveCTurner mentioned this pull request Nov 14, 2021

Index exists both in graveyard and cluster state caused nodes join failed. #80673

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore metadata of deleted indices at start #48918

Ignore metadata of deleted indices at start #48918

DaveCTurner commented Nov 9, 2019

elasticmachine commented Nov 9, 2019

DaveCTurner commented Nov 9, 2019

DaveCTurner commented Nov 10, 2019

DaveCTurner commented Nov 11, 2019

andrershov left a comment

andrershov Nov 11, 2019

DaveCTurner Nov 11, 2019

DaveCTurner Nov 11, 2019

andrershov left a comment

dpeddi commented Jan 28, 2021

DaveCTurner commented Jan 28, 2021

Ignore metadata of deleted indices at start #48918

Ignore metadata of deleted indices at start #48918

Conversation

DaveCTurner commented Nov 9, 2019

elasticmachine commented Nov 9, 2019

DaveCTurner commented Nov 9, 2019

DaveCTurner commented Nov 10, 2019

DaveCTurner commented Nov 11, 2019

andrershov left a comment

Choose a reason for hiding this comment

andrershov Nov 11, 2019

Choose a reason for hiding this comment

DaveCTurner Nov 11, 2019

Choose a reason for hiding this comment

DaveCTurner Nov 11, 2019

Choose a reason for hiding this comment

andrershov left a comment

Choose a reason for hiding this comment

dpeddi commented Jan 28, 2021

DaveCTurner commented Jan 28, 2021