-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor translog download flow and Add support to run SegRep integ tests using remote store settings #6405
Refactor translog download flow and Add support to run SegRep integ tests using remote store settings #6405
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
9b3a4f0
to
ed57c70
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Following test is failing, looking into it.
|
d934bdc
to
409c78e
Compare
Gradle Check (Jenkins) Run Completed with:
|
69172da
to
897d263
Compare
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6405 +/- ##
============================================
+ Coverage 70.73% 70.85% +0.11%
- Complexity 59281 59313 +32
============================================
Files 4812 4812
Lines 283614 283641 +27
Branches 40896 40900 +4
============================================
+ Hits 200606 200962 +356
+ Misses 66552 66201 -351
- Partials 16456 16478 +22
... and 459 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
public void syncTranslogFilesFromRemoteTranslog() throws IOException { | ||
TranslogFactory translogFactory = translogFactorySupplier.apply(indexSettings, shardRouting); | ||
assert translogFactory instanceof RemoteBlobStoreInternalTranslogFactory; | ||
Repository repository = ((RemoteBlobStoreInternalTranslogFactory) translogFactory).getRepository(); | ||
assert repository instanceof BlobStoreRepository : "repository should be instance of BlobStoreRepository"; | ||
BlobStoreRepository blobStoreRepository = (BlobStoreRepository) repository; | ||
FileTransferTracker fileTransferTracker = new FileTransferTracker(shardId); | ||
TranslogTransferManager translogTransferManager = RemoteFsTranslog.buildTranslogTransferManager( | ||
blobStoreRepository, | ||
getThreadPool(), | ||
shardId, | ||
fileTransferTracker | ||
); | ||
RemoteFsTranslog.download(translogTransferManager, shardPath().resolveTranslog()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire logic should sit in RemoteFsTranslog and IndexShard should ever interact with a common Translog
interface. Otherwise every consumer will need to be aware of the underlying translog which will make the interfaces too tightly coupled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will make the changes accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking it here: #5679
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale <kalsac@amazon.com>
897d263
to
af69749
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
|
||
import static org.opensearch.test.hamcrest.OpenSearchAssertions.assertAcked; | ||
|
||
@OpenSearchIntegTestCase.ClusterScope(scope = OpenSearchIntegTestCase.Scope.TEST, numDataNodes = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question - what numDataNodes = 0
means here? I checked the code base and saw multiple such occurrences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to control number of nodes in the test cluster. That is why we provide 0 and start nodes as per the test requirement.
More on numDataNodes
: https://github.com/opensearch-project/OpenSearch/blob/main/test/framework/src/main/java/org/opensearch/test/OpenSearchIntegTestCase.java#L1722
@@ -137,7 +137,9 @@ public ReadOnlyEngine( | |||
} | |||
if (seqNoStats == null) { | |||
seqNoStats = buildSeqNoStats(config, lastCommittedSegmentInfos); | |||
ensureMaxSeqNoEqualsToGlobalCheckpoint(seqNoStats); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - would it make this code more readable, if we offload this check to within ensureMaxSeqNoEqualsToGlobalCheckpoint
method -
if (requireCompleteHistory == false || engineConfig.getIndexSettings().isRemoteTranslogStoreEnabled()) {
return;
}
OpenSearch/server/src/main/java/org/opensearch/index/engine/ReadOnlyEngine.java
Lines 179 to 181 in 5f89081
protected void ensureMaxSeqNoEqualsToGlobalCheckpoint(final SeqNoStats seqNoStats) { | |
if (requireCompleteHistory == false) { | |
return; |
@@ -186,7 +188,7 @@ protected void ensureMaxSeqNoEqualsToGlobalCheckpoint(final SeqNoStats seqNoStat | |||
// In addition to that we only execute the check if the index the engine belongs to has been | |||
// created after the refactoring of the Close Index API and its TransportVerifyShardBeforeCloseAction | |||
// that guarantee that all operations have been flushed to Lucene. | |||
assert assertMaxSeqNoEqualsToGlobalCheckpoint(seqNoStats.getMaxSeqNo(), seqNoStats.getGlobalCheckpoint()); | |||
assertMaxSeqNoEqualsToGlobalCheckpoint(seqNoStats.getMaxSeqNo(), seqNoStats.getGlobalCheckpoint()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also change the contract of the method to return void since the return value is not being used any more?
@@ -3079,7 +3085,8 @@ public void updateGlobalCheckpointOnReplica(final long globalCheckpoint, final S | |||
* while the global checkpoint update may have emanated from the primary when we were in that state, we could subsequently move | |||
* to recovery finalization, or even finished recovery before the update arrives here. | |||
*/ | |||
assert state() != IndexShardState.POST_RECOVERY && state() != IndexShardState.STARTED | |||
assert (state() != IndexShardState.POST_RECOVERY && state() != IndexShardState.STARTED) | |||
|| (indexSettings.isRemoteTranslogStoreEnabled() == true && state() != IndexShardState.RECOVERING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a short comment or add to method's java doc on why this condition has been added?
@@ -141,6 +154,25 @@ public static void download(TranslogTransferManager translogTransferManager, Pat | |||
} | |||
} | |||
|
|||
private static void deleteTranslogFilesNotUploaded(Path location, long uploadedGeneration) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall the earlier check on deleting files before downloading translog files was added as otherwise the download used to fail, right? Now we are deleting files that are greater than the max generation referenced by the remote translog metadata file. While this is really a good optimisation, I think we should check the local file's checksum and the expected checksum. If there is a story on the checksum part already, then do let me know. If not, let's follow it up in next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On high level, comparing checksum makes sense. This is what we do for segments and can be added here as well.
The only reason I have not added this check was due to the invariant that we have: only one node will be uploading a translog file at any given time. So, if the translog file name matches, checksum must be same. Let me create a tracking issue to discuss this in more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created tracking issue: #6896
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed on high level. Do take a look at the comments and address.
Signed-off-by: Sachin Kale <kalsac@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
This PR is trying to solve multiple things. Breaking it down into multiple PRs |
Description
Issues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.