-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Design Proposal] Using Segment Replication for cross-cluster-replication #4090
Comments
POC: |
Thanks @ankitkala, few initial thoughts
Maybe we need to discuss this further where we could think more about a pub-sub based mechanism to let the follower cluster know about a checkpoint post which the follower can start fetching data from leader. We could reuse the same pub-sub model for remote store integration to pull data directly from remote store.
|
The bi-directional cross cluster connection is harder to maintain. Currently we can't support replication if leader is on the higher OS version than follower as the segments might not be readable on an older lucene version. With bi-directional connection, user would be able create replication in both the directions which will create cyclic dependency for version upgrades (technically same issue is relevant for Segment replication from primary to replica and there is still no clear way forward).
The way i look at it, we'll refactor and re-use most of the logic that exists for local segment replication. The only difference would be entry point where follower would poll periodically and invoke
If CCR moving to core is the end state we want to be in, then it makes sense for us to add the new logic directly in core rather than CCR. Otherwise, we might end up rewriting these transport actions again during migration. |
Current implementation of CCR uses logical replication where we replay all the leader shard’s operations on the follower's primary shard. With the ongoing effort for Segment Replication, local replica will simply syncs the segments stored on the disk from primary shard offering significantly better throughput (documented here (#2229)). This documents proposes the design for Cross Cluster Replication using the Segment Replication.
Why Segment Replication for CCR
Pros:
Cons:
Design Tenets
How Segment Replication(local) works!
Segment Replication is triggered on primary shard refresh. Upon refresh, all replica shards are notified with a Replication checkpoint(seqno for latest doc, latest commit gen and primary term).
For each notification, replica shard will do these following operations:
Taken from here
Proposal for CCR Segment Replication:
CCR Replication type selection logic:
We aren't planning to give this as a choice to the customer. CCR will simply mimic the replication model used on leader's primary and replicas. So if leader cluster is relying on segment replication for replicas, CCR will also use segment replication.
Deviation from existing Segment Replication:
Compatibility with Segment Replication and Remote Storage integration:
After this integration, replica shards will sync the segment directly from remote store instead primary shard. We'll need to build additional support for CCR so that follower cluster shards can sync the data from leader's remote store.
The text was updated successfully, but these errors were encountered: