Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Handle translog upload during primary to primary relocation for remote-backed indexes #5795

Closed
ashking94 opened this issue Jan 10, 2023 · 1 comment
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework v2.6.0 'Issues and PRs related to version v2.6.0'

Comments

@ashking94
Copy link
Member

ashking94 commented Jan 10, 2023

Is your feature request related to a problem? Please describe.
Currently, when peer recovery happens for primary-primary relocation, both the older & newer primary bootstraps engine and translog such that the upload happens from both the shards.

Following things need to be ensured -

  • When index.translog.durability is set as async, the translog sync happens every index.translog.sync_interval periodically. With this change, we decide to upload translog to remote store basis the primary mode. Since translog sync is async, in current status quo state, the upload is not determinstic to happen before primaryMode is marked as false. We need to ensure that the upload happens (successfully or it fails) deterministically before the primaryMode turns false on the old primary and before the new primary resets the engine (resetToWriteableEngine).
  • Reset the new primary engine after all translogs have been uploaded from the older primary to the remote store. This will ensure that the new primary gets all the translogs from remote store and then only it starts accepting any new write requests. This is already handled with ForceSyncTransportRequestHandler.

There is possible issue with data loss in relocation flow for seg rep based indexes. Issue - #5848

Describe the solution you'd like
Since the complete handoff process ensures that the primary mode switch happens such that the any inflight request in older primary is first drained off and then the newer primary gets promoted with primary mode set to true as part of handoff.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@ashking94 ashking94 added enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework v2.6.0 'Issues and PRs related to version v2.6.0' labels Jan 10, 2023
@ashking94 ashking94 self-assigned this Jan 13, 2023
@sachinpkale
Copy link
Member

Fixed in #5804

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework v2.6.0 'Issues and PRs related to version v2.6.0'
Projects
None yet
Development

No branches or pull requests

2 participants