-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple replication lag from logic to fail stale replicas #9507
Conversation
…ale replicas Signed-off-by: Ankit Kala <ankikala@amazon.com>
Signed-off-by: Ankit Kala <ankikala@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Compatibility status:Checks if related components are compatible with change 5d3633c Incompatible componentsIncompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/security-analytics.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git] |
Compatibility status:Checks if related components are compatible with change 5d3633c Incompatible componentsIncompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/security-analytics.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git] |
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #9507 +/- ##
============================================
- Coverage 71.14% 71.12% -0.02%
- Complexity 57552 57584 +32
============================================
Files 4782 4783 +1
Lines 271408 271444 +36
Branches 39633 39639 +6
============================================
- Hits 193084 193063 -21
- Misses 62123 62176 +53
- Partials 16201 16205 +4
|
server/src/main/java/org/opensearch/index/SegmentReplicationPressureService.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/SegmentReplicationPressureService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/seqno/ReplicationTracker.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/indices/replication/common/SegmentReplicationLagTimer.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/SegmentReplicationShardStats.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Ankit Kala <ankikala@amazon.com>
Signed-off-by: Ankit Kala <ankikala@amazon.com>
Compatibility status:Checks if related components are compatible with change 75ad56c Incompatible componentsIncompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/reporting.git] |
Gradle Check (Jenkins) Run Completed with:
|
Compatibility status:Checks if related components are compatible with change b921169 Incompatible componentsIncompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git] |
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Ankit Kala <ankikala@amazon.com>
Compatibility status:Checks if related components are compatible with change 4fe6846 Incompatible componentsIncompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git] |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9507-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d66df10b248457d3d9778131d6939dd1a2185e39
# Push it to GitHub
git push --set-upstream origin backport/backport-9507-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…h-project#9507) * Decouple replication lag from replication timer logic used to fail stale replicas Signed-off-by: Ankit Kala <ankikala@amazon.com> * Added changelog entry Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments 2 Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> * fix UT Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry Gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> --------- Signed-off-by: Ankit Kala <ankikala@amazon.com> (cherry picked from commit d66df10)
…h-project#9507) * Decouple replication lag from replication timer logic used to fail stale replicas Signed-off-by: Ankit Kala <ankikala@amazon.com> * Added changelog entry Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments 2 Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> * fix UT Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry Gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> --------- Signed-off-by: Ankit Kala <ankikala@amazon.com> (cherry picked from commit d66df10)
…h-project#9507) * Decouple replication lag from replication timer logic used to fail stale replicas Signed-off-by: Ankit Kala <ankikala@amazon.com> * Added changelog entry Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments 2 Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> * fix UT Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry Gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> --------- Signed-off-by: Ankit Kala <ankikala@amazon.com>
…h-project#9507) * Decouple replication lag from replication timer logic used to fail stale replicas Signed-off-by: Ankit Kala <ankikala@amazon.com> * Added changelog entry Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments 2 Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> * fix UT Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry Gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> --------- Signed-off-by: Ankit Kala <ankikala@amazon.com>
…9705) * Decouple replication lag from replication timer logic used to fail stale replicas * Added changelog entry * Addressed comments * Addressed comments 2 * Addressed comments * Retry gradle * fix UT * Addressed comments * Retry Gradle --------- Signed-off-by: Ankit Kala <ankikala@amazon.com> Co-authored-by: Ankit Kala <ankikala@amazon.com>
…h-project#9507) * Decouple replication lag from replication timer logic used to fail stale replicas Signed-off-by: Ankit Kala <ankikala@amazon.com> * Added changelog entry Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments 2 Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> * fix UT Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry Gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> --------- Signed-off-by: Ankit Kala <ankikala@amazon.com> Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
…h-project#9507) * Decouple replication lag from replication timer logic used to fail stale replicas Signed-off-by: Ankit Kala <ankikala@amazon.com> * Added changelog entry Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments 2 Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> * fix UT Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry Gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> --------- Signed-off-by: Ankit Kala <ankikala@amazon.com> Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
…h-project#9507) * Decouple replication lag from replication timer logic used to fail stale replicas Signed-off-by: Ankit Kala <ankikala@amazon.com> * Added changelog entry Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments 2 Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> * fix UT Signed-off-by: Ankit Kala <ankikala@amazon.com> * Addressed comments Signed-off-by: Ankit Kala <ankikala@amazon.com> * Retry Gradle Signed-off-by: Ankit Kala <ankikala@amazon.com> --------- Signed-off-by: Ankit Kala <ankikala@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
Current implementation relies on the replication timer tracked by primary's checkpoint tracker to evaluation replica's staleness. While this correctly provide the replication_lag, it is not ideal to rely on replication lag as it also includes the timer taken by primary shard to upload the data to remote store. We shouldn't penalize & fail replicas if the primary is slow in uploading the segments. To get around this, we need to track two times separately:
Changes done:
indexShard.updateReplicationCheckpoint
to only create the timers (and not start)Related Issues
#8453
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.