Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle translog upload during primary relocation for remote-backed indexes #5804

Merged
merged 12 commits into from
Jan 18, 2023

Conversation

ashking94
Copy link
Member

@ashking94 ashking94 commented Jan 10, 2023

Signed-off-by: Ashish Singh ssashish@amazon.com

Description

Solves #5795 & #5844.

Issues Resolved

Solves #5795 & #5844

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ashking94 ashking94 added skip-changelog distributed framework Storage:Durability Issues and PRs related to the durability framework v2.6.0 'Issues and PRs related to version v2.6.0' labels Jan 10, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Ashish Singh <ssashish@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@gbbafna gbbafna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @ashking94 .

@gbbafna gbbafna merged commit 44aabe9 into opensearch-project:main Jan 18, 2023
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Jan 18, 2023
@ashking94
Copy link
Member Author

Backport failed due to #5618 not being backported. Backporting this first and then seeing if it can be backported with tag again.

@ashking94 ashking94 added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch labels Jan 18, 2023
@gbbafna gbbafna added v2.6.0 'Issues and PRs related to version v2.6.0' and removed v2.6.0 'Issues and PRs related to version v2.6.0' labels Jan 18, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5804-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 44aabe9bb57e33716fbafcc81d29c5385a55f34f
# Push it to GitHub
git push --set-upstream origin backport/backport-5804-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5804-to-2.x.

@ashking94
Copy link
Member Author

Failing now due to #5344 not being in 2.x.

Comment on lines 204 to +211
private boolean upload(Long primaryTerm, Long generation) throws IOException {
logger.trace("uploading translog for {} {} ", primaryTerm, generation);
// During primary relocation (primary-primary peer recovery), both the old and the new primary have engine
// created with the RemoteFsTranslog. Both primaries are equipped to upload the translogs. The primary mode check
// below ensures that the real primary only is uploading. Before the primary mode is set as true for the new
// primary, the engine is reset to InternalEngine which also initialises the RemoteFsTranslog which in turns
// downloads all the translogs from remote store and does a flush before the relocation finishes.
if (primaryModeSupplier.getAsBoolean() == false) {
logger.trace("skipped uploading translog for {} {}", primaryTerm, generation);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we think about a No-op version of RemoteFsTranslog that switches atomically on tracker updates, rather than passing around primary mode here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluated the approach you suggested. This is a bit tricky as currently the translog is supplied through translog factory and then the Translog implementation gets initialised only when the engine is created or reset. For the new primary during relocation, the engine is reset before the primary mode is changed from false into true and the primary mode change on older primary (during relocation) does not reset or recreates the engine. Although this is possible, this seems to be make this simple check to a bit complicated. Currently, changes in primaryMode has not listeners or observers. Will try this again after clearing of some high ROI items for remote-backed store items. Thanks for bringing this up.

nknize pushed a commit to nknize/OpenSearch that referenced this pull request Jan 19, 2023
…r remote-backed indexes (opensearch-project#5804)

* Upload translog only if primaryMode is true

Signed-off-by: Ashish Singh <ssashish@amazon.com>
@ashking94 ashking94 added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch labels Jan 20, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5804-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 44aabe9bb57e33716fbafcc81d29c5385a55f34f
# Push it to GitHub
git push --set-upstream origin backport/backport-5804-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5804-to-2.x.

@ashking94 ashking94 added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch labels Jan 30, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 30, 2023
…r remote-backed indexes (#5804)

* Upload translog only if primaryMode is true

Signed-off-by: Ashish Singh <ssashish@amazon.com>
(cherry picked from commit 44aabe9)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@ashking94
Copy link
Member Author

Backport is failing with java compilation failure. I will try to raise backport manually.

ashking94 added a commit to ashking94/OpenSearch that referenced this pull request Jan 30, 2023
…r remote-backed indexes (opensearch-project#5804)

* Upload translog only if primaryMode is true

Signed-off-by: Ashish Singh <ssashish@amazon.com>
gbbafna pushed a commit that referenced this pull request Jan 30, 2023
…r remote-backed indexes (#5804) (#6064)

* Upload translog only if primaryMode is true

Signed-off-by: Ashish Singh <ssashish@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch distributed framework skip-changelog Storage:Durability Issues and PRs related to the durability framework v2.6.0 'Issues and PRs related to version v2.6.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants