-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute Segment Replication stats for SR backpressure. #6520
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
4d062a2
to
9e6d73c
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
5720723
to
4372177
Compare
This PR introduces new mechanisms to keep track of the current replicas within a replication group intended to be used to apply backpressure. The new stats are also added to NodeStats. Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mch2 Posting comments from older review (PR got updated in between).
...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodesStatsRequest.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/SegmentReplicationShardStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/SegmentReplicationShardStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/SegmentReplicationStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/SegmentReplicationShardStats.java
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6520 +/- ##
============================================
- Coverage 70.80% 70.78% -0.02%
- Complexity 59143 59157 +14
============================================
Files 4802 4807 +5
Lines 282965 283159 +194
Branches 40792 40819 +27
============================================
+ Hits 200356 200440 +84
- Misses 66187 66271 +84
- Partials 16422 16448 +26
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Marc Handalian <handalm@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Also update to return an array of replicas vs a map of objects. Signed-off-by: Marc Handalian <handalm@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
… segments. This can happen when there is a new checkpoint published because of a SegmentInfos bump, but the replica does not need to fetch any files. Signed-off-by: Marc Handalian <handalm@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Marc Handalian <handalm@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/SegmentReplicationShardStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/SegmentReplicationShardStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/indices/replication/SegmentReplicationSourceHandler.java
Show resolved
Hide resolved
Signed-off-by: Marc Handalian <handalm@amazon.com>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Gradle Check (Jenkins) Run Completed with:
|
* | ||
* @opensearch.internal | ||
*/ | ||
public class SegmentReplicationStatsTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can this class be removed with getStats()
moved to SegmentReplicationPressureService
as I don't see this class maintaining/tracking segrep stats. Please ignore if there is any pending future work which demands this class's existence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented with this commit - 473cc02. I've opened that up in draft and will open it for review once this lands.
Would like a second reviewer on this and 473cc02. @Bukhtawar, @sachinpkale, @ashking94 available for review? Note - 473cc02 implements actual enforcement of pressure from these computed metrics. I'll also be putting up a separate PR to actually check for and fail stale replicas from both the primary & replica sides. |
@dreamer-89 I think we should consolidate these stats into the segment_replication API rather than in NodeStats. Will discard this and open a separate PR. |
Description
This PR introduces new mechanisms to keep track of the current replicas within a replication group. It is the first step in changes to add backpressure for lagging replicas when SR is enabled.
getSegmentReplicationStats
to fetch these stats per in-sync replica in addition to the amount of bytes the replica is behind. In addition it returns the average replication lag for the whole group.Example Response:
Issues Resolved
closes #6388
related #4478
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.