[Segment Replication + Remote Store] Remove divergent logic between node-node and remote store implementations #9016
Labels
enhancement
Enhancement or improvement to existing feature or request
The problem:
There is divergent logic with remote store + node-node segment replication within NRTReplicationEngine that should be removed. The main difference is with node-node replication we are currently committing when new segments arrive & when an engine is closed. For remote store we are never performing commits and relying on commit points from the store to be sent in order to sync the xlog and have a valid commit point during engine resets on failover.
I propose we remove this divergent logic for a few reasons. First fetching segments_n (commit point files) from the remote store is problematic for both primary and replica recovery paths & during segrep. Described below. Second it has lead to recent flakiness in our tests and hard to reproduce bugs around pit/scroll (due inconsistent incref/decref of segments_n on reader open).
Recovery
During recovery the latest refresh point in the store will reference a segments_n that may reference segments not part of the latest refresh point. For example the commit point could be T1:
[0.cfs, _0.si, _0.cfe, segments4]
while the latest uploaded refresh point from the primary is T2:[1.cfs, _1.si, 1.cfe segments4]
. The last commit made is segment4. Our logic today fetches all files on the latest refresh point (including segments_n) and commits the in-memory bytes. We need to extend this logic to replicas or it risks corruption at engine startup.Segrep
During segment syncs we also do not require the segments_n file. The case for sending them is that we don't need to perform a segmentInfos.commit that is currently performed by node-node replication. However, with the situation T2 above we risk closing our engine without valid commit point. This is problematic during engine reset where we temporarily create a RO engine from the latest on-disk commit point before syncing with the remote store in the background.
Pros:
Some cons:
Alternatives:
Task outline:
The text was updated successfully, but these errors were encountered: