Prevent import of dangling indices from a later version #34264

Bukhtawar · 2018-10-03T11:38:38Z

A higher version of an index 6.3.2 can be restored on a cluster with node version 6.3.1 however when additional nodes try to join the cluster, their membership is restricted based on compatibility checks
which ensures that all indices in the given metadata will not be created with a newer version of elasticsearch as well as that all indices are newer or equal to the minimum index compatibility version.
based on Version#minimumIndexCompatibilityVersion

Elasticsearch version (bin/elasticsearch --version): 6.3.1

Plugins installed: []

JVM version (java -version): JDK 10

OS version (uname -a if on a Unix-like system): Linux ip-10-212-18-25 4.9.70-25.242.amzn1.x86_64

Description of the problem including expected versus actual behavior:

Steps to reproduce:

Restore an index of version 6.3.2 on a datanode with version 6.3.1
Spin up a new data node with version 6.3.1
Notice node is unable to join the cluster

Provide logs (if relevant):
[2018-10-03T00:00:10,030][INFO ][o.e.d.z.ZenDiscovery ] [wowvdMi] failed to send join request to master [{5GZT6AM}{5GZT6AMsSSy2x5vCdDLFXA}{1aYFoazmRXWt2OKoo3Slfg}{10.xx.xxx.xxx}{10.xx.xxx.xxx:9300}], reason [RemoteTransportException[[5GZT6AM][10.xx.xx.xxx:9300][internal:discovery/zen/join]]; nested: IllegalStateException[index [index-docs_2018092113/fKi49vf3TzKDshgg9ydzaQ] version not supported: 6.3.2 the node version is: 6.3.1]; ]

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-10-03T11:59:56Z

Pinging @elastic/es-distributed

original-brownbear · 2018-10-20T15:43:06Z

I have this reproduced => fixing now to not allow restoring a newer index version to an older datanode version.

* Restore should check minimum version in the cluster and not the current master node's version for compatibility * Closes elastic#34264

original-brownbear · 2018-10-21T09:27:41Z

I talked about this with @DaveCTurner and the behavior here seems to not necessarily be a bug.

We are not actually restoring any data to the old data node in a mixed cluster, all we're doing is acknowledging the restore which puts the newer index in the state and prevent old datanodes from joining.

With the change I suggested in #34676 we're preventing putting starting the restore during rolling upgrade which may be less confusing but also reduces functionality during rolling upgrade (which seems to work fine if you have new version data nodes present).

=> maybe the behavior is ok as is?

DaveCTurner · 2018-10-22T07:50:55Z

I'd like more information from @Bukhtawar because I tried the steps given to reproduce this and got the expected error when trying to restore a snapshot taken on a 6.4.2 cluster into a 6.4.1 cluster:

POST /_snapshot/my_backup/snapshot_1/_restore
# 500 Internal Server Error
# {
#   "status": 500,
#   "error": {
#     "reason": "[my_backup:snapshot_1/qHahtNWXQ3elpe8rYqHEkA] the snapshot was created with Elasticsearch version [6.4.2] which is higher than the version of this node [6.4.1]",
#     "root_cause": [
#       {
#         "reason": "[my_backup:snapshot_1/qHahtNWXQ3elpe8rYqHEkA] the snapshot was created with Elasticsearch version [6.4.2] which is higher than the version of this node [6.4.1]",
#         "type": "snapshot_restore_exception"
#       }
#     ],
#     "type": "snapshot_restore_exception"
#   }
# }

To reproduce this needed a cluster that comprised a mix of 6.4.1 and 6.4.2 version nodes. We only expect a cluster of mixed versions to occur during a rolling upgrade, and in this situation we don't expect there to be more 6.4.1 nodes joining the cluster - in fact there are other things you can do during a rolling upgrade that will block the older nodes from joining the cluster, such as simply creating an index, and the solution in all cases is to upgrade the removed node. I think perhaps the docs could be clearer on this subject, but I can't see that we should change the behaviour here.

Did this occur during a rolling uprgade, and if so why were there more 6.4.1 nodes joining the cluster?

Bukhtawar · 2018-10-24T09:39:23Z

@DaveCTurner Thanks for taking a look.

We needed to urgently resize our clusters by adding additional capacity which didn't work coz the restore of a higher version of index from another snapshot repository failed the membership checks and subsequently forced us to delete that index.
Well we did try to restore a 6.3.2 index onto a 6.3.1 node which worked without the need for a mixed cluster

DaveCTurner · 2018-10-24T09:58:07Z

Well we did try to restore a 6.3.2 index onto a 6.3.1 node which worked without the need for a mixed cluster

I cannot currently see how to reproduce this in a single-version cluster. My attempt, described above, failed with an exception. Can you explain how to reproduce this?

Bukhtawar · 2018-10-27T08:31:22Z

Start-up a single-node ES 6.2.4 cluster (say N1) . Create an index (say idx-624) on it and add some documents. The index would have the version as 6.2.4 in its metadata.
Start-up another single-node ES 6.2.3 cluster (say N2). Allow N1 to join this cluster. Ensure that the master of the cluster is still N2, the older version ES cluster. The combined cluster now would also show index: idx-624.
Initiate a snapshot on this cluster (say S1). The S3 file structure would look as the below.
- Root level metadata will contain the master version which is 6.2.3
- Index metadata for idx-624 will contain the index version which is 6.2.4

Root
|-snap-.dat -> Root level metadata file contains the master ES version 6.2.3
|-indexes
|-
|-meta-.dat -> Contains the index version 6.2.4

Restore S1 onto an ES cluster in version ES-6.2.3. During restore, ES validates the root level metadata and restore will succeed.
ES cluster now contains an index idx-624 of a higher version than the current ES version.

DaveCTurner · 2018-10-27T08:46:57Z

Aha, thanks, that helps. This is a very strange sequence of operations - you are essentially merging two distinct clusters together, which only works today because of the lenience that dangling indices provide. Indeed when I try this I see the vital log message as the later-versioned index is brought into the earlier-versioned cluster:

[2018-10-27T09:37:36,225][INFO ][o.e.g.LocalAllocateDangledIndices] [VNUG9FA] auto importing dangled indices [[i/aU903bhuQ_-V1_9t-XC99A]/OPEN] from [{u13ixhL}{u13ixhL1SPK-fcYmr3AVQg}{yetE4T05SjeCbUFlxdyxBw}{127.0.0.1}{127.0.0.1:9301}{ml.machine_memory=17179869184, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]

I think we should not import the dangling 6.4.2 index in this case, because as soon as that has happened no more 6.4.1 nodes can join the cluster - there is no need to snapshot and restore anything. Good catch.

Today it is possible that we import a dangling index that was created in a newer version than one or more of the nodes in the cluster. Such an index would prevent the older node(s) from rejoining the cluster if they were to briefly leave it for some reason. This commit prevents the import of such dangling indices. Fixes #34264

ywelsch added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Oct 3, 2018

original-brownbear self-assigned this Oct 17, 2018

original-brownbear added the >bug label Oct 20, 2018

original-brownbear mentioned this issue Oct 21, 2018

SNAPSHOT: Restore Should Check Min. Version #34676

Closed

original-brownbear added the discuss label Oct 21, 2018

DaveCTurner added feedback_needed and removed >bug discuss labels Oct 22, 2018

ywelsch added >bug discuss feedback_needed and removed >bug feedback_needed discuss labels Oct 22, 2018

DaveCTurner unassigned original-brownbear Oct 27, 2018

DaveCTurner added the help wanted adoptme label Oct 27, 2018

DaveCTurner changed the title ~~A higher version of an index can be restored in a cluster but when nodes try to join the cluster, membership fails~~ Prevent import of dangling indices from a later version Oct 27, 2018

RvI101 mentioned this issue Oct 29, 2019

Ignore dangling indices created in newer versions #48652

Merged

DaveCTurner closed this as completed in #48652 Oct 31, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent import of dangling indices from a later version #34264

Prevent import of dangling indices from a later version #34264

Bukhtawar commented Oct 3, 2018

elasticmachine commented Oct 3, 2018

original-brownbear commented Oct 20, 2018

original-brownbear commented Oct 21, 2018

DaveCTurner commented Oct 22, 2018

Bukhtawar commented Oct 24, 2018

DaveCTurner commented Oct 24, 2018

Bukhtawar commented Oct 27, 2018

DaveCTurner commented Oct 27, 2018 •

edited

Loading

Prevent import of dangling indices from a later version #34264

Prevent import of dangling indices from a later version #34264

Comments

Bukhtawar commented Oct 3, 2018

elasticmachine commented Oct 3, 2018

original-brownbear commented Oct 20, 2018

original-brownbear commented Oct 21, 2018

DaveCTurner commented Oct 22, 2018

Bukhtawar commented Oct 24, 2018

DaveCTurner commented Oct 24, 2018

Bukhtawar commented Oct 27, 2018

DaveCTurner commented Oct 27, 2018 • edited Loading

DaveCTurner commented Oct 27, 2018 •

edited

Loading