-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Step-by-step tutorial for uni-directional CCR failover #84854
Comments
Pinging @elastic/es-docs (Team:Docs) |
How should .kibana be handled? |
We removed the auto-follow pattern for system indices in 8.0.0, but we can still specify follower index to replicate a specific leader_index. For example, to replicate
The DR flow for system indices is the same as my initial post, just that it should avoid using |
Thanks Leaf. Can we add this to the documentation as well, as it pertains to the DR use case, and covering failover/failback scenarios? Customers in this situation would certainly want to understand what can be sync'ed to their DR cluster, vs what cannot be sync'ed. |
As a follow up question, what ARE the effects of setting up the .kibana as a follower index? How does this pertain to all kinds of configuration elements as well as not allowing direct access to system indices? For example, does this also work with the taskmanager part to handle the dimension of duplicate tasks/alerts? If this would also be included so that customers understand the impact dimensions that would go a long way for the right DR setup in combination with CCR. |
@Leaf-Lin : Another follow-up question that I have to add to the list is: Will users be able to continue to explicitly follow system indices and complete the steps to convert them to regular indices? The last few months there has been a number of changes that have been made to make it more difficult to work with the system indices. For example, #72815, #63513, and #74212. When our users are planning their DR strategies, they want to know how forward compatible their plans are as they need to be continually patching their deployments. So knowing if explicit CCR is planned to be taken away or not is important. |
Pinging @elastic/es-distributed (Team:Distributed) |
Although the comment will allow us to replicate this particular system index, it is not ideal. One still needs to manually delete
These questions are spot-on. For these reasons you have mentioned, (plus upgrade handling), the follower cluster must have
We agree that disaster recovery for system indices today is not well-implemented. I have raised an enhancement request2 which would require a cross-team effort to address. I am not aware of any planned changes in the short term. Footnotes |
@Leaf-Lin the doc team is very busy, do you think you can provide the documentation change you proposed? |
resolved by #91491 |
Description
As of writing, ccr does not offer automatic failover. Can we please add the following tutorial for the failover scenario?
The initial setup can be skipped as it's similar to Tutorial: Set up cross-cluster replication. Adding it here for completeness.
Initial setup (uni-directional CCR with
DR
cluster followingProduction
cluster)Step1: Create remote clusters on
DR
and point toproduction
Step2: Create an index on
Production
Step3: Create follower index on DR
Step4: Test follower index on DR
Production
cluster, all search queries can be directed to eitherProduction
orDR
clusters.When Production down:
Step1: On the Client's side, pause ingestion of
my_index
intoProduction
.Step2: On the Elasticsearch side, turn the follower indices in the
DR
into regular indices:Ensure no writes are occurring on the leader index (if the data centre is down, or cluster is unavailable, no action needed)
On DR: Convert the follower index to a normal index in Elasticsearch (capable of accepting writes)
Step3: On the Client side, manually re-enable ingestion of my_index to the
DR
cluster. (You can test that the index should be writable:DR
cluster during this time.Once the
Production
comes back:Step1: On the clients side, stop writes to
my_index
onDR
cluster.Step2: Create remote clusters on
Production
and points toDR
Step3: Create follower indices in
Production
, connecting them to the leader inDR
. The former leader indices inProduction
have outdated data and will need to be discarded/deleted. Wait for Production follower indices to catch up. Once it is caught up, you can turn the follower indices inProduction
to regular index again.Step4: Delete the former
DR
writeable indices that contain outdated data now. Create follower indices in theDR
again to ensure that all changes fromProduction
are streamed toDR
. (This is the same as the initial setup)Step5: On the Client side, manually re-enable ingestion to the
Production
cluster.Production
, all search queries can be directed to eitherProduction
orDR
clusters.The text was updated successfully, but these errors were encountered: