Protect replicated data streams against local rollovers #65999

martijnvg · 2020-12-08T08:03:11Z

Backporting #64710 to the 7.x branch.

When a data stream is being auto followed then a rollover in a local cluster can break auto following,
if the local cluster performs a rollover then it creates a new write index and if then later the remote
cluster rolls over as well then that new write index can't be replicated, because it has the same name
as in the write index in the local cluster, which was created earlier.

If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams.
The data stream should be rolled over in the remote cluster and that change should replicate to the local
cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should
perform.

To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class.
The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is
a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag
is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example
during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated
data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new
write index is no longer a follower index. Also if the put follow api is attempting to update this data stream
(for example to attempt to resume auto following) then that with fail, because the data stream is no longer a
replicated data stream.

Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster
to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the
alias doesn't have a write index. The added replicated field in the DataStream class and added validation
achieve the same kind of protection, but in a more robust way.

A followup from #61993.

Backporting elastic#64710 to the 7.x branch. When a data stream is being auto followed then a rollover in a local cluster can break auto following, if the local cluster performs a rollover then it creates a new write index and if then later the remote cluster rolls over as well then that new write index can't be replicated, because it has the same name as in the write index in the local cluster, which was created earlier. If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams. The data stream should be rolled over in the remote cluster and that change should replicate to the local cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should perform. To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class. The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new write index is no longer a follower index. Also if the put follow api is attempting to update this data stream (for example to attempt to resume auto following) then that with fail, because the data stream is no longer a replicated data stream. Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the alias doesn't have a write index. The added replicated field in the DataStream class and added validation achieve the same kind of protection, but in a more robust way. A followup from elastic#61993.

elasticmachine · 2020-12-08T08:03:15Z

Pinging @elastic/es-core-features (Team:Core/Features)

martijnvg added backport :Data Management/Data streams Data streams and their lifecycles labels Dec 8, 2020

elasticmachine added the Team:Data Management Meta label for data/management team label Dec 8, 2020

fixed compile errors

c62e583

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 8, 2020

disable bwc tests for elastic#65999

3b9fb9b

martijnvg added a commit that referenced this pull request Dec 8, 2020

disable bwc tests for #65999 (#66002)

884d7e3

martijnvg merged commit 1596b93 into elastic:7.x Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protect replicated data streams against local rollovers #65999

Protect replicated data streams against local rollovers #65999

martijnvg commented Dec 8, 2020 •

edited

Loading

elasticmachine commented Dec 8, 2020

Protect replicated data streams against local rollovers #65999

Protect replicated data streams against local rollovers #65999

Conversation

martijnvg commented Dec 8, 2020 • edited Loading

elasticmachine commented Dec 8, 2020

martijnvg commented Dec 8, 2020 •

edited

Loading