Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Draft - Create Segrep orchestration class & Target implementation #3409

Closed
wants to merge 11 commits into from

Conversation

mch2
Copy link
Member

@mch2 mch2 commented May 19, 2022

Description

This is a draft to show how the refactors introduce in #3234 are intended to be used. This is intended for illustration purposes right now, it needs tests & some package reorganization.

This PR introduces components that orchestrate the flow for Segment Replication events.

SegmentReplicationTargetService - orchestration component for replication events. Registers a request handler to handle file chunk requests for segrep.
SegmentReplicationSource - Interface that will provide checkpoint data & copy segment files.
PeerReplicationSource - implementation of SegmentReplicationSource where the source is another shard.

Along with some stubbed components that will be implemented in upcoming PRs:
SegmentReplicationTarget - ReplicationTarget implementation for segrep, will handle invoking methods on a provided source.
SegmentReplicationSourceService - Service that handles requests from PeerReplicationSource.
SegmentReplicationState State object to track replication events.

Additional Refactoring:

  • Renames RecoveryFileChunkRequest to FileChunkRequest and move throttling logic to ReplicationTarget to be reused.
  • pulls logic required to create & complete a listener between file chunk requests to ReplicationTarget.
  • pulls logic from RemoteRecoveryHandler to execute retrya-ble requests to a reusable component.

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 8c9eaae84079a989504c2f597fe91325368dd00c
Log 5493

Reports 5493

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 5c5340263f4fb9148cf53a50b4ce41f997f487fb
Log 5500

Reports 5500

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure a006336c4472ce212f63b32ad423f631ee60a9ba
Log 5501

Reports 5501

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure d562fa0274b930338afd27fbb1a6c4c7d6d7e7b5
Log 5502

Reports 5502

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 18eced14aa7894505f19510f0bd14c9284e502a3
Log 5506

Reports 5506

// How many bytes we've copied since we last called RateLimiter.pause
private final AtomicLong bytesSinceLastPause = new AtomicLong();
private final RecoverySettings recoverySettings;
private final ReplicationCollection<T> onGoingTransfers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why operate on the collection? Why not operate on a single ReplicationTarget ?

Comment on lines 58 to 68
RateLimiter rateLimiter = recoverySettings.rateLimiter();
if (rateLimiter != null) {
long bytes = bytesSinceLastPause.addAndGet(request.content().length());
if (bytes > rateLimiter.getMinPauseCheckBytes()) {
// Time to pause
bytesSinceLastPause.addAndGet(-bytes);
long throttleTimeInNanos = rateLimiter.pause(bytes);
indexState.addTargetThrottling(throttleTimeInNanos);
replicationTarget.indexShard().recoveryStats().addThrottleTime(throttleTimeInNanos);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the only stateful code in this class. How about moving it to a decorator and creating new FileChunkRequestHandler instances for each new request?

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success c508939
Log 5555

Reports 5555

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 3689f5d
Log 5567

Reports 5567

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 3177f87d00a76f54f198f016b3f09d475614dacf
Log 5568

Reports 5568

@mch2 mch2 force-pushed the segrep-reuse branch 2 times, most recently from 983fcde to 82209ce Compare May 24, 2022 21:35
mch2 added 5 commits May 24, 2022 14:40
This change introduces target and source services for Segment Replication.
It also refactors PeerRecoveryTargetService and RemoteRecoveryTargetHandler to reuse
a FileChunkRequestHandler and transport client that issues retryable requests.

Signed-off-by: Marc Handalian <handalm@amazon.com>
Removed any coupling to Recovery settings.

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
…the handler class.

Signed-off-by: Marc Handalian <handalm@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 983fcde4ee705bd5e5c85cebcd3d4fc5c24d4734
Log 5570

Reports 5570

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 82209ce47a63a8e9b133784c900d6f2425b5d419
Log 5571

Reports 5571

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 1ac59c1
Log 5572

Reports 5572

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 40e92c3
Log 5573

Reports 5573

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure a09d24d4940fd817b845fc76a732a77215696602
Log 5574

Reports 5574

…arget.

Signed-off-by: Marc Handalian <handalm@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 1b5023b
Log 5575

Reports 5575

Signed-off-by: Marc Handalian <handalm@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 78c450c
Log 5580

Reports 5580

mch2 added 2 commits May 24, 2022 20:17
Signed-off-by: Marc Handalian <handalm@amazon.com>
…eplicationSource.

Signed-off-by: Marc Handalian <handalm@amazon.com>
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 07edfb5
Log 5584

Reports 5584

@mch2
Copy link
Member Author

mch2 commented May 25, 2022

Cutting this up into multiple PRs, starting with #3439

@mch2 mch2 closed this May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants