Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication + Remote Store] Remove divergent logic between node-node and remote store implementations #9016

Closed
mch2 opened this issue Aug 1, 2023 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request

Comments

@mch2
Copy link
Member

mch2 commented Aug 1, 2023

The problem:
There is divergent logic with remote store + node-node segment replication within NRTReplicationEngine that should be removed. The main difference is with node-node replication we are currently committing when new segments arrive & when an engine is closed. For remote store we are never performing commits and relying on commit points from the store to be sent in order to sync the xlog and have a valid commit point during engine resets on failover.

I propose we remove this divergent logic for a few reasons. First fetching segments_n (commit point files) from the remote store is problematic for both primary and replica recovery paths & during segrep. Described below. Second it has lead to recent flakiness in our tests and hard to reproduce bugs around pit/scroll (due inconsistent incref/decref of segments_n on reader open).

Recovery
During recovery the latest refresh point in the store will reference a segments_n that may reference segments not part of the latest refresh point. For example the commit point could be T1: [0.cfs, _0.si, _0.cfe, segments4] while the latest uploaded refresh point from the primary is T2: [1.cfs, _1.si, 1.cfe segments4]. The last commit made is segment4. Our logic today fetches all files on the latest refresh point (including segments_n) and commits the in-memory bytes. We need to extend this logic to replicas or it risks corruption at engine startup.

Segrep
During segment syncs we also do not require the segments_n file. The case for sending them is that we don't need to perform a segmentInfos.commit that is currently performed by node-node replication. However, with the situation T2 above we risk closing our engine without valid commit point. This is problematic during engine reset where we temporarily create a RO engine from the latest on-disk commit point before syncing with the remote store in the background.

Pros:

  • No divergent logic from n2n replication.
  • Avoid sending unnecessary segments & extra bytes already sent with infos bytes.
  • Always restore from store to the furthest ahead point.

Some cons:

  • local commit file created on replicas that would not come from the store.
  • During recovery & engine reset could have segments_n that is not in the remote store. Will require cleanup before shard is opened & latest point is fetched.

Alternatives:

  • Remove commits on n2n to remove divergent logic. This is problematic because primary shards do not incref the segments_n files, they only hold previous commits through deletion policy. We would need to also acquire and hold the latest commit while fetching the latest segmentInfos to download. This is also problematic for reasons described above, we would need to ensure we always have every file the commit point references. I think the only reasonable approach that would work for removing commits is to eliminate refresh points altogether and perform fsyncless "commits" on the primary, so that every replication event sent a unique segments_n file over to replicas.

Task outline:

  1. Separate recovery flow from segrep flow. - https://github.com/opensearch-project/OpenSearch/pull/8767/files
  2. Add commit during recovery for replicas.
  3. During sync performed in recovery wipe any additional commit pt or segment unreferenced by the latest remote point.
  4. Remove segment_n from getting sent in both cases.
  5. Remove divergent commit logic in the NRT engine.
@mch2
Copy link
Member Author

mch2 commented Aug 21, 2023

closing with #9111

@mch2 mch2 closed this as completed Aug 21, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in Segment Replication Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request
Projects
Status: Done
Development

No branches or pull requests

1 participant