Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize stale blobs deletion during snapshot delete #3796

Merged
merged 5 commits into from
Jul 22, 2022

Conversation

piyushdaftary
Copy link
Contributor

Signed-off-by: Piyush Daftary pdaftary@amazon.com

Description

Currently during snapshot delete, deletion of unlinked shard level blob is single threaded using SNAPSHOT threadpool. Hence if there is huge number of unlinked shard level blob flies, it will take considerable amount of time to clean them.

Hence I propose to make unlinked shard level blob deletion multi threaded delete the same way we do for cleaning up of stale indices, to speedup the overall snapshot deletion process.

Issues Resolved

#2156

Check List

  • [Y] New functionality includes testing.
    • [Y] All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • [Y] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@piyushdaftary piyushdaftary requested review from a team and reta as code owners July 7, 2022 08:36
@piyushdaftary
Copy link
Contributor Author

@sohami @AmiStrn @dreamer-89 : Thanks for reviewing the earlier PR : #2159 .
Somehow the PR is closed and I am unable to open it. Thus creating this new PR addressing the comments given by you in the previous PR . Please review.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2022

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member

dreamer-89 commented Jul 7, 2022

@sohami @AmiStrn @dreamer-89 : Thanks for reviewing the earlier PR : #2159 . Somehow the PR is closed and I am unable to open it. Thus creating this new PR addressing the comments given by you in the previous PR . Please review.

@piyushdaftary : I re-opened the PR #2159 which already has review comments and useful discussion. Please feel free to close this one and continue on PR 2159.

@piyushdaftary
Copy link
Contributor Author

@dreamer-89 : Unable to update the old PR. Thus Created this new PR.
To ensure earlier discussion is not lost, I have mentioned the old PR #2159 here.

@dblock dblock requested review from andrross and kartg July 7, 2022 20:55
nknize
nknize previously requested changes Jul 7, 2022
Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 nice! This is a great performance boost. Just have some suggestions around testing and a question around the newly introduced parameter (javadocs and user guidance); I think we can do a little better there?

int numberOfFiles = numberOfFiles(repositoryPath);

logger.info("--> adding some more documents to test index");
for (int j = 0; j < 10; ++j) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we randomize this using randomIntBetween() to inject some entropy in the number of threads? Do you have a general feel for the number of documents per number of threads so we can use a reasonable range?

Copy link
Contributor Author

@piyushdaftary piyushdaftary Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize : The number of documents per thread varies a lot depending on number of documents deleted (stale) or updated between the snapshot being deleted and other remaining snapshots in the repository. I am planning to make batch size to be randomIntBetween(1, 1000) and number of document to be in range between 1 to 10000000. WDYT ?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@reta reta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on my side

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2022

Codecov Report

Merging #3796 (2d07bc5) into main (e724b2e) will increase coverage by 0.11%.
The diff coverage is 57.64%.

@@             Coverage Diff              @@
##               main    #3796      +/-   ##
============================================
+ Coverage     70.46%   70.58%   +0.11%     
- Complexity    56600    56672      +72     
============================================
  Files          4557     4563       +6     
  Lines        272737   272781      +44     
  Branches      40040    40043       +3     
============================================
+ Hits         192188   192540     +352     
+ Misses        64324    63925     -399     
- Partials      16225    16316      +91     
Impacted Files Coverage Δ
...n/cluster/health/TransportClusterHealthAction.java 47.75% <ø> (ø)
...g/opensearch/cluster/ClusterStateTaskListener.java 100.00% <ø> (ø)
.../opensearch/cluster/MasterNodeChangePredicate.java 0.00% <0.00%> (ø)
...ava/org/opensearch/cluster/NotMasterException.java 0.00% <0.00%> (ø)
...rch/cluster/coordination/NoMasterBlockService.java 0.00% <0.00%> (ø)
...ter/coordination/UnsafeBootstrapMasterCommand.java 0.00% <0.00%> (ø)
...rg/opensearch/common/settings/ClusterSettings.java 91.89% <ø> (ø)
...rch/common/settings/ConsistentSettingsService.java 67.85% <ø> (ø)
.../opensearch/common/util/concurrent/BaseFuture.java 62.71% <0.00%> (ø)
...java/org/opensearch/discovery/DiscoveryModule.java 90.27% <ø> (ø)
... and 532 more

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

…delete_batch_size

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
@piyushdaftary piyushdaftary requested a review from kartg July 19, 2022 19:45
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@andrross
Copy link
Member

@reta @nknize You still have outstanding changes requested on this PR. Can you confirm this looks good?

@andrross andrross added the backport 2.x Backport to 2.x branch label Jul 21, 2022
@andrross andrross dismissed stale reviews from reta and nknize July 22, 2022 20:04

Changes have been addressed

@andrross andrross merged commit 1c787e8 into opensearch-project:main Jul 22, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 22, 2022
* Parallelize stale blobs deletion during snapshot delete

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>

* Adding test which throws exception

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>

* Adusting identation for spotlessJavaCheck

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>

* Adding more description to MAX_SHARD_BLOB_DELETE_BATCH_SIZE

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>

* Renaming max_shard_blob_delete_batch_size to max_snapshot_shard_blob_delete_batch_size

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
(cherry picked from commit 1c787e8)
Copy link
Collaborator

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for pushing this forward. Sorry for the delay @piyushdaftary and thanks for addressing the questions. Adding my LGTM post merge for completeness!

VachaShah pushed a commit that referenced this pull request Jul 30, 2022
* Parallelize stale blobs deletion during snapshot delete

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>

* Adding test which throws exception

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>

* Adusting identation for spotlessJavaCheck

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>

* Adding more description to MAX_SHARD_BLOB_DELETE_BATCH_SIZE

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>

* Renaming max_shard_blob_delete_batch_size to max_snapshot_shard_blob_delete_batch_size

Signed-off-by: Piyush Daftary <pdaftary@amazon.com>
(cherry picked from commit 1c787e8)

Co-authored-by: piyush <pdaftary@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants