-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segment Replication - Fix NoSuchFileException errors caused when computing metadata snapshot on primary shards. #4422
Segment Replication - Fix NoSuchFileException errors caused when computing metadata snapshot on primary shards. #4422
Conversation
…uting metadata snapshot on primary shards. (opensearch-project#4366) * Segment Replication - Fix NoSuchFileException errors caused when computing metadata snapshot on primary shards. This change fixes the errors that occur when computing metadata snapshots on primary shards from the latest in-memory SegmentInfos. The error occurs when a segments_N file that is referenced by the in-memory infos is deleted as part of a concurrent commit. The segments themselves are incref'd by IndexWriter.incRefDeleter but the commit file (Segments_N) is not. This change resolves this by ignoring the segments_N file when computing metadata for CopyState and only sending incref'd segment files to replicas. Signed-off-by: Marc Handalian <handalm@amazon.com> * Fix spotless. Signed-off-by: Marc Handalian <handalm@amazon.com> * Update StoreTests.testCleanupAndPreserveLatestCommitPoint to assert additional segments are deleted. Signed-off-by: Marc Handalian <handalm@amazon.com> * Rename snapshot to metadataMap in CheckpointInfoResponse. Signed-off-by: Marc Handalian <handalm@amazon.com> * Refactor segmentReplicationDiff method to compute off two maps instead of MetadataSnapshots. Signed-off-by: Marc Handalian <handalm@amazon.com> * Fix spotless. Signed-off-by: Marc Handalian <handalm@amazon.com> * Revert catchall in SegmentReplicationSourceService. Signed-off-by: Marc Handalian <handalm@amazon.com> * Revert log lvl change. Signed-off-by: Marc Handalian <handalm@amazon.com> * Fix SegmentReplicationTargetTests Signed-off-by: Marc Handalian <handalm@amazon.com> * Cleanup unused logger. Signed-off-by: Marc Handalian <handalm@amazon.com> Signed-off-by: Marc Handalian <handalm@amazon.com> Co-authored-by: Suraj Singh <surajrider@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Last run failed with below flaky test failure. Refiring!
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## 2.x #4422 +/- ##
==========================================
Coverage 70.54% 70.55%
- Complexity 56942 57101 +159
==========================================
Files 4572 4584 +12
Lines 273816 274453 +637
Branches 40152 40220 +68
==========================================
+ Hits 193170 193629 +459
- Misses 64455 64595 +140
- Partials 16191 16229 +38
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@@ -483,10 +525,12 @@ private void waitForReplicaUpdate() throws Exception { | |||
final List<ShardSegments> replicaShardSegments = segmentListMap.get(false); | |||
// if we don't have any segments yet, proceed. | |||
final ShardSegments primaryShardSegments = primaryShardSegmentsList.stream().findFirst().get(); | |||
logger.debug("Primary Segments: {}", primaryShardSegments.getSegments()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to leave this in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this can remain.
if (primaryShardSegments.getSegments().isEmpty() == false) { | ||
final Map<String, Segment> latestPrimarySegments = getLatestSegments(primaryShardSegments); | ||
final Long latestPrimaryGen = latestPrimarySegments.values().stream().findFirst().map(Segment::getGeneration).get(); | ||
for (ShardSegments shardSegments : replicaShardSegments) { | ||
logger.debug("Replica {} Segments: {}", shardSegments.getShardRouting(), shardSegments.getSegments()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
@@ -334,6 +336,51 @@ public MetadataSnapshot getMetadata(SegmentInfos segmentInfos) throws IOExceptio | |||
return new MetadataSnapshot(segmentInfos, directory, logger); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - could add in a line explaining why we're leaving out the segments_n files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Poojita-Raj for the comment. This change is needed to fix the file not found exception.
PR against main #4366 contains more details around the issue and fix.
Manual backport of #4366 to 2.x