Adapt the Recovery API for closed indices #38421

tlrx · 2019-02-05T13:29:54Z

Note: this pull request is aimed to be merged in the replicated-closed-indices feature branch

This commit adapts the Recovery API to make it work with shards of replicated closed indices.

Relates #33888

elasticmachine · 2019-02-05T13:29:56Z

Pinging @elastic/es-distributed

rest-api-spec/src/main/resources/rest-api-spec/test/cat.recovery/10_basic.yml

tlrx · 2019-02-19T14:16:39Z

server/src/main/java/org/elasticsearch/action/admin/indices/recovery/RecoveryRequest.java

@@ -47,7 +48,7 @@ public RecoveryRequest() {
     * @param indices   Comma-separated list of indices about which to gather recovery information
     */
    public RecoveryRequest(String... indices) {
-        super(indices);
+        super(indices, IndicesOptions.STRICT_EXPAND_OPEN_CLOSED);


Note: the Recovery API reports information for shards that effectively recovered from something. In a cluster with mixed versions, index are closed but not replicated: they don't have index routing table and shards don't have a recovery state, so they are just not reported by the Recovery API.

I think this change will mean that in a mixed cluster, the result of the recovery API will be different if you pass in a closed index depending on which node you hit in the cluster (a v7 or v8). A v7 node will respond failure and a v8 node will respond nothing. Is that correct?

This seems like a very small issue though?

This is a good remark and I also thought about it (hence my previoud comment). Your understanding is right, in a mixed cluster if a 8.0 node processes the recovery request it will resolves both closed and opened indices, while a 7.x node will resolve opened indices only.

But then the list of resolved indices is compared to the current cluster state routing table (see TransportRecoveryAction#shards() method) and only indices that have a routing table will be included in the scope of the computation. In mixed cluster, closed indices are not replicated and their routing table is removed from the cluster state at closing time (see #38955) so the situation where a 8.0 node is asking a 7.0 node for information about a shard of a closed index it is assigned to should never happen (also because old node cannot join back an upgraded cluster).

On 7.x, we use STRICT_EXPAND_OPEN_FORBID_CLOSED. I believe this means that if you specify anything with wildcard, closed will be ignored. This will then be identical in a mixed cluster (all good).

But if you specify a specific index with no wildcards, the semantics of STRICT_EXPAND_OPEN_FORBID_CLOSED is to throw an exception if the index is closed.

On 8.0, this should obviously be changed like you did, ie. allow both open and closed.

But in a mixed cluster, a request specifying a specific closed index can either hit a 7.x node or an 8.x node (using REST or transport). If it hits a 7.0 node, the request will fail, if it hits an 8.0 node, it will succeed with an empty response. This could seem a bit random to someone behind a load balancer.

DaveCTurner

I agree with your comment about -1 for the number of translog ops, and left a suggestion about the percentage. Otherwise this looks good.

rest-api-spec/src/main/resources/rest-api-spec/test/cat.recovery/10_basic.yml

tlrx · 2019-02-21T19:10:59Z

Thanks @DaveCTurner. I updated the code, can you please have another look?

henningandersen

LGTM.

Left a few comments. Thanks @tlrx

rest-api-spec/src/main/resources/rest-api-spec/test/cat.recovery/10_basic.yml

henningandersen · 2019-02-22T10:03:00Z

rest-api-spec/src/main/resources/rest-api-spec/test/indices.recovery/10_basic.yml

+        index: [test_2]
+        human: true
+
+  - match: { test_2.shards.0.type:                                 "EXISTING_STORE"        }


As discussed on another channel, a user could find it odd that closing an index makes it go through recovery. Also, if cluster health will be changed to by default also look at closed indices, cluster health could go red for a short period when closing an index. This is more of a discussion subject: is this external behaviour acceptable or not. Maybe @ywelsch has a view on this?

As discussed on another channel, a user could find it odd that closing an index makes it go through recovery.

I can understand this point of view, but I also think that we could mitigate this with a better documentation.

Also, if cluster health will be changed to by default also look at closed indices, cluster health could go red for a short period when closing an index. This is more of a discussion subject: is this external behaviour acceptable or not.

I forgot to mention during our discussion that we talked about this in our team meeting few months ago and we agreed on the fact that this situation is acceptable. This behavior is similar to what happens today when an index is reopened, turning the cluster state to RED until primary shards are assigned.

Note that if the red cluster health is too big of a concern, we can change that as well by making it only yellow during this transition (can be done by adapting ClusterShardHealth#getInactivePrimaryHealth).

henningandersen · 2019-02-22T10:16:37Z

server/src/main/java/org/elasticsearch/action/admin/indices/recovery/RecoveryRequest.java

@@ -47,7 +48,7 @@ public RecoveryRequest() {
     * @param indices   Comma-separated list of indices about which to gather recovery information
     */
    public RecoveryRequest(String... indices) {
-        super(indices);
+        super(indices, IndicesOptions.STRICT_EXPAND_OPEN_CLOSED);


I think this change will mean that in a mixed cluster, the result of the recovery API will be different if you pass in a closed index depending on which node you hit in the cluster (a v7 or v8). A v7 node will respond failure and a v8 node will respond nothing. Is that correct?

This seems like a very small issue though?

server/src/test/java/org/elasticsearch/indices/recovery/IndexRecoveryIT.java

tlrx · 2019-02-25T12:04:21Z

Thaks @henningandersen !

Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888

This commit adapts the Recovery API to make it work with shards of replicated closed indices. Relates elastic#33888

Backport support for replicating closed indices (#39499) Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888

tlrx added >enhancement :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Feb 5, 2019

tlrx mentioned this pull request Feb 5, 2019

Replicate closed indices #33888

Closed

50 tasks

tlrx force-pushed the rci-adapt-recovery-api branch from 60c7452 to f1a8e24 Compare February 11, 2019 15:30

Expose Recovery API for closed indices

6a7757a

tlrx force-pushed the rci-adapt-recovery-api branch from f1a8e24 to 6a7757a Compare February 18, 2019 20:21

tlrx requested a review from henningandersen February 19, 2019 08:33

tlrx commented Feb 19, 2019

View reviewed changes

rest-api-spec/src/main/resources/rest-api-spec/test/cat.recovery/10_basic.yml Outdated Show resolved Hide resolved

tlrx requested a review from DaveCTurner February 19, 2019 09:14

tlrx commented Feb 19, 2019

View reviewed changes

DaveCTurner reviewed Feb 20, 2019

View reviewed changes

rest-api-spec/src/main/resources/rest-api-spec/test/cat.recovery/10_basic.yml Outdated Show resolved Hide resolved

tlrx added 2 commits February 21, 2019 17:01

Merge branch 'replicated-closed-indices' into rci-adapt-recovery-api

7230ed0

Apply feedback

70eaee5

henningandersen approved these changes Feb 22, 2019

View reviewed changes

tlrx added 2 commits February 22, 2019 12:53

Fix cat/recovery test

a534a1d

closedIndex

e4082db

tlrx merged commit 4db7fd9 into elastic:replicated-closed-indices Feb 25, 2019

tlrx deleted the rci-adapt-recovery-api branch February 25, 2019 12:04

tlrx mentioned this pull request Feb 28, 2019

Add support for replicating closed indices #39499

Merged

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Mar 1, 2019

Adapt the Recovery API for closed indices (elastic#38421)

95d9e9a

This commit adapts the Recovery API to make it work with shards of replicated closed indices. Relates elastic#33888

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt the Recovery API for closed indices #38421

Adapt the Recovery API for closed indices #38421

tlrx commented Feb 5, 2019

elasticmachine commented Feb 5, 2019

tlrx Feb 19, 2019

henningandersen Feb 22, 2019

tlrx Feb 22, 2019

henningandersen Feb 22, 2019

DaveCTurner left a comment

tlrx commented Feb 21, 2019

henningandersen left a comment

henningandersen Feb 22, 2019

tlrx Feb 22, 2019

ywelsch Feb 25, 2019

henningandersen Feb 22, 2019

tlrx commented Feb 25, 2019

Adapt the Recovery API for closed indices #38421

Adapt the Recovery API for closed indices #38421

Conversation

tlrx commented Feb 5, 2019

elasticmachine commented Feb 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

tlrx commented Feb 21, 2019

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Feb 25, 2019