Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax NoOpEngine constraints on translog #37413

Merged
merged 1 commit into from
Jan 23, 2019

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Jan 14, 2019

Note: this pull request is against the replicated-closed-indices feature branch.

When a NoOpEngine is instanciated, the current implementation verifies that the translog contains no operations and that it contains the same UUID as the last Lucene commit data. I think that we can relax those two constraints because the Close Index API now ensure that all translog operations are flushed before closing a shard. The detection of coherence between translog UUID / Lucene commit data is not specific to NoOpEngine, and is already done by IndexShard.innerOpenEngineAndTranslog().

This pull request also adds a test for recovery from a NoOpEngine.

@tlrx tlrx added >non-issue v7.0.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Jan 14, 2019
@tlrx tlrx requested a review from ywelsch January 14, 2019 10:58
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@tlrx tlrx force-pushed the replicated-closed-indices branch 2 times, most recently from 49ce196 to fbbeff4 Compare January 15, 2019 11:14
@ywelsch ywelsch removed their request for review January 17, 2019 13:47
@tlrx tlrx force-pushed the replicated-closed-indices branch from fbbeff4 to b00b323 Compare January 22, 2019 08:45
@tlrx tlrx force-pushed the relax-noopengine branch from dd9ca8d to eeaaf50 Compare January 22, 2019 09:09
@tlrx
Copy link
Member Author

tlrx commented Jan 22, 2019

I rebased this pull request after #37426 has been merged in master.

@tlrx tlrx requested a review from ywelsch January 22, 2019 09:10
@tlrx tlrx merged commit f6b34bc into elastic:replicated-closed-indices Jan 23, 2019
@tlrx tlrx deleted the relax-noopengine branch January 23, 2019 08:29
@tlrx
Copy link
Member Author

tlrx commented Jan 23, 2019

Thanks @ywelsch

@tlrx tlrx mentioned this pull request Jan 23, 2019
50 tasks
tlrx added a commit that referenced this pull request Jan 29, 2019
When a NoOpEngine is instanciated, the current implementation verifies 
that the translog contains no operations and that it contains the same 
UUID as the last Lucene commit data.We can relax those two constraints 
because the Close Index API now ensure that all translog operations are 
flushed before closing a shard. The detection of coherence between translog 
UUID / Lucene commit data is not specific to NoOpEngine, and is already 
done by IndexShard.innerOpenEngineAndTranslog().

Related to #33888
tlrx added a commit that referenced this pull request Jan 29, 2019
When a NoOpEngine is instanciated, the current implementation verifies 
that the translog contains no operations and that it contains the same 
UUID as the last Lucene commit data.We can relax those two constraints 
because the Close Index API now ensure that all translog operations are 
flushed before closing a shard. The detection of coherence between translog 
UUID / Lucene commit data is not specific to NoOpEngine, and is already 
done by IndexShard.innerOpenEngineAndTranslog().

Related to #33888
tlrx added a commit that referenced this pull request Jan 30, 2019
When a NoOpEngine is instanciated, the current implementation verifies 
that the translog contains no operations and that it contains the same 
UUID as the last Lucene commit data.We can relax those two constraints 
because the Close Index API now ensure that all translog operations are 
flushed before closing a shard. The detection of coherence between translog 
UUID / Lucene commit data is not specific to NoOpEngine, and is already 
done by IndexShard.innerOpenEngineAndTranslog().

Related to #33888
tlrx added a commit that referenced this pull request Feb 28, 2019
Before this change, closed indexes were simply not replicated. It was therefore 
possible to close an index and then decommission a data node without knowing 
that this data node contained shards of the closed index, potentially leading to 
data loss. Shards of closed indices were not completely taken into account when 
balancing the shards within the cluster, or automatically replicated through shard 
copies, and they were not easily movable from node A to node B using APIs like 
Cluster Reroute without being fully reopened and closed again.

This commit changes the logic executed when closing an index, so that its shards 
are not just removed and forgotten but are instead reinitialized and reallocated on 
data nodes using an engine implementation which does not allow searching or
 indexing, which has a low memory overhead (compared with searchable/indexable 
opened shards) and which allows shards to be recovered from peer or promoted 
as primaries when needed.

This new closing logic is built on top of the new Close Index API introduced in 
6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before 
closing them, and closing an index on a 8.0 cluster will reinitialize the index shards 
and therefore impact the cluster health.

Some APIs have been adapted to make them work with closed indices:
- Cluster Health API
- Cluster Reroute API
- Cluster Allocation Explain API
- Recovery API
- Cat Indices
- Cat Shards
- Cat Health
- Cat Recovery

This commit contains all the following changes (most recent first):
* c6c42a1 Adapt NoOpEngineTests after #39006
* 3f9993d Wait for shards to be active after closing indices (#38854)
* 5e7a428 Adapt the Cluster Health API to closed indices (#39364)
* 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767)
* 71f5c34 Recover closed indices after a full cluster restart (#39249)
* 4db7fd9 Adapt the Recovery API for closed indices (#38421)
* 4fd1bb2 Adapt more tests suites to closed indices (#39186)
* 0519016 Add replica to primary promotion test for closed indices (#39110)
* b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631)
* c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955)
* 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex()
* e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329)
* cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327)
* b9becdd Adapt testPendingTasks() for replicated closed indices (#38326)
* 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024)
* e53a9be Fix compilation error in IndexShardIT after merge with master
* cae4155 Relax NoOpEngine constraints (#37413)
* 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes
* c63fd69 [RCI] Add NoOpEngine for closed indices (#33903)

Relates to #33888
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Mar 1, 2019
When a NoOpEngine is instanciated, the current implementation verifies 
that the translog contains no operations and that it contains the same 
UUID as the last Lucene commit data.We can relax those two constraints 
because the Close Index API now ensure that all translog operations are 
flushed before closing a shard. The detection of coherence between translog 
UUID / Lucene commit data is not specific to NoOpEngine, and is already 
done by IndexShard.innerOpenEngineAndTranslog().

Related to elastic#33888
tlrx added a commit that referenced this pull request Mar 1, 2019
Backport support for replicating closed indices (#39499)
    
    Before this change, closed indexes were simply not replicated. It was therefore
    possible to close an index and then decommission a data node without knowing
    that this data node contained shards of the closed index, potentially leading to
    data loss. Shards of closed indices were not completely taken into account when
    balancing the shards within the cluster, or automatically replicated through shard
    copies, and they were not easily movable from node A to node B using APIs like
    Cluster Reroute without being fully reopened and closed again.
    
    This commit changes the logic executed when closing an index, so that its shards
    are not just removed and forgotten but are instead reinitialized and reallocated on
    data nodes using an engine implementation which does not allow searching or
     indexing, which has a low memory overhead (compared with searchable/indexable
    opened shards) and which allows shards to be recovered from peer or promoted
    as primaries when needed.
    
    This new closing logic is built on top of the new Close Index API introduced in
    6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before
    closing them, and closing an index on a 8.0 cluster will reinitialize the index shards
    and therefore impact the cluster health.
    
    Some APIs have been adapted to make them work with closed indices:
    - Cluster Health API
    - Cluster Reroute API
    - Cluster Allocation Explain API
    - Recovery API
    - Cat Indices
    - Cat Shards
    - Cat Health
    - Cat Recovery
    
    This commit contains all the following changes (most recent first):
    * c6c42a1 Adapt NoOpEngineTests after #39006
    * 3f9993d Wait for shards to be active after closing indices (#38854)
    * 5e7a428 Adapt the Cluster Health API to closed indices (#39364)
    * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767)
    * 71f5c34 Recover closed indices after a full cluster restart (#39249)
    * 4db7fd9 Adapt the Recovery API for closed indices (#38421)
    * 4fd1bb2 Adapt more tests suites to closed indices (#39186)
    * 0519016 Add replica to primary promotion test for closed indices (#39110)
    * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631)
    * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955)
    * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex()
    * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329)
    * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327)
    * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326)
    * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024)
    * e53a9be Fix compilation error in IndexShardIT after merge with master
    * cae4155 Relax NoOpEngine constraints (#37413)
    * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes
    * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903)
    
    Relates to #33888
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >non-issue v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants