Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Refactor file cleanup logic and fix PIT/Scroll with remote store. #9111

Merged
merged 8 commits into from
Aug 10, 2023

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Aug 4, 2023

Description

This change fixes multiple issues around scroll/PIT tests with Segment Replication and remote store. These issues stem from different logic in NRTReplicationEngine around how segments and commit points are preserved between refresh cycles. With node to node we are only performing local commits and preserving the latest on-disk commit but with remote store it was possible for a new incoming commit point to leave a still required commit point available to deletion as its not the "latest commit" on disk.

With this change all replicas with segrep enabled perform local commits when necessary from the incoming SegmentInfos byte[] only and ignore any incoming segments_n from its replication source. This PR also changes the recovery sync with remote store to exclude the segments_n so that only the fetched infos bytes are committed before an engine is opened.

This change also simplifies deletion logic with segment replication to automatically delete when a file is decref'd to 0 to make it easier to reason about when files/cleanup is performed.
Files are Incref'd when they are loaded on to the reader and when committed or when a segmentInfosSnapshot is acquired.
Files are decref'd after a new commit is made, when a reader is closed, or a segmentInfosSnapshot is closed.

Related Issues

Resolves #8850
Resolves #7556
Resolves #8777

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git]
Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/reporting.git]

BUILD SUCCESSFUL in 26m 43s

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

Gradle Check (Jenkins) Run Completed with:

@tlfeng tlfeng added :test Adding or fixing a test skip-changelog labels Aug 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2023

Gradle Check (Jenkins) Run Completed with:

@tlfeng
Copy link
Collaborator

tlfeng commented Aug 7, 2023

The above 2 build having the same test failure

> Task :server:test

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.index.engine.NRTReplicationEngineTests.testUpdateSegments_replicaReceivesSISWithLowerGen" -Dtests.seed=6F54B4D34B49FD4 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-MT -Dtests.timezone=America/St_Vincent -Druntime.java=20

org.opensearch.index.engine.NRTReplicationEngineTests > testUpdateSegments_replicaReceivesSISWithLowerGen FAILED
    java.lang.AssertionError: expected:<5> but was:<7>
        at __randomizedtesting.SeedInfo.seed([6F54B4D34B49FD4:FD75BA2C32425ECF]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.index.engine.NRTReplicationEngineTests.testUpdateSegments_replicaReceivesSISWithLowerGen(NRTReplicationEngineTests.java:161)

@mch2
Copy link
Member Author

mch2 commented Aug 7, 2023

Thanks @tlfeng will get this cleaned up.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2023

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/security-analytics.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

BUILD SUCCESSFUL in 35m 54s

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

BUILD SUCCESSFUL in 26m 54s

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

BUILD SUCCESSFUL in 29m 45s

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

BUILD SUCCESSFUL in 39m 43s

@opensearch-trigger-bot

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@mch2 mch2 marked this pull request as ready for review August 9, 2023 17:14
mch2 added 8 commits August 10, 2023 14:07
This change removes divergent commit paths for segrep node-node and remote store.
All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file.
This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before
an engine is opened.
This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0.

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
…it a new segmentInfos.

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Marc Handalian <handalm@amazon.com>
@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/reporting.git]
Compatible components: [https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

BUILD SUCCESSFUL in 24m 39s

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member

Gradle Check (Jenkins) Run Completed with:

Assertion trip here

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=2679, name=opensearch[node_s1][flush][T#1], state=RUNNABLE, group=TGRP-RemoteStoreForceMergeIT]
	at __randomizedtesting.SeedInfo.seed([90BCF02533C9A76B:61FEE78AB3B48CC0]:0)
Caused by: java.lang.AssertionError: [remote-store-test-idx-1][0] Expected non-empty readers
	at __randomizedtesting.SeedInfo.seed([90BCF02533C9A76B]:0)
	at org.opensearch.index.translog.RemoteFsTranslog.deleteStaleRemotePrimaryTerms(RemoteFsTranslog.java:430)
	at org.opensearch.index.translog.RemoteFsTranslog.trimUnreferencedReaders(RemoteFsTranslog.java:400)
	at org.opensearch.index.translog.InternalTranslogManager.trimUnreferencedReaders(InternalTranslogManager.java:394)
	at org.opensearch.index.engine.InternalEngine.revisitIndexDeletionPolicyOnTranslogSynced(InternalEngine.java:537)
	at org.opensearch.index.engine.InternalEngine$1.onAfterTranslogSync(InternalEngine.java:265)
	at org.opensearch.index.translog.listener.CompositeTranslogEventListener.onAfterTranslogSync(CompositeTranslogEventListener.java:49)
	at org.opensearch.index.translog.InternalTranslogManager.syncTranslog(InternalTranslogManager.java:193)
	at org.opensearch.index.shard.IndexShard.sync(IndexShard.java:4154)
	at org.opensearch.index.IndexService.maybeFSyncTranslogs(IndexService.java:980)
	at org.opensearch.index.IndexService$AsyncTranslogFSync.runInternal(IndexService.java:1104)
	at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1623)

@mch2
Copy link
Member Author

mch2 commented Aug 10, 2023

./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.RemoteStoreForceMergeIT.testRestoreForceMergeSingleIteration" -Dtests.seed=90BCF02533C9A76B -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hi-IN -Dtests.timezone=Europe/Amsterdam

Not able to repro this locally - @ankitkala @gbbafna wondering if you have context here?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@mch2 mch2 merged commit c301544 into opensearch-project:main Aug 10, 2023
@mch2 mch2 added the backport 2.x Backport to 2.x branch label Aug 10, 2023
@mch2 mch2 deleted the cleanupdeletes branch August 10, 2023 23:11
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-9111-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c30154458a44e91a2f245b2357e69ecc839265a9
# Push it to GitHub
git push --set-upstream origin backport/backport-9111-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-9111-to-2.x.

neetikasinghal pushed a commit to neetikasinghal/OpenSearch that referenced this pull request Aug 10, 2023
…with remote store. (opensearch-project#9111)

* Remove divergent commit logic with segment replication.

This change removes divergent commit paths for segrep node-node and remote store.
All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file.
This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before
an engine is opened.
This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more NRTReplicationEngineTests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more shard level tests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add test ensuring commits are cleaned up on replicas.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Self review.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Use refresh level sync before recovery

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
mch2 added a commit to mch2/OpenSearch that referenced this pull request Aug 11, 2023
…with remote store. (opensearch-project#9111)

* Remove divergent commit logic with segment replication.

This change removes divergent commit paths for segrep node-node and remote store.
All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file.
This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before
an engine is opened.
This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more NRTReplicationEngineTests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more shard level tests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add test ensuring commits are cleaned up on replicas.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Self review.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Use refresh level sync before recovery

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
(cherry picked from commit c301544)
mch2 added a commit that referenced this pull request Aug 11, 2023
…fix PIT/Scroll with remote store. (#9272)

* [Segment Replication] Refactor file cleanup logic and fix PIT/Scroll with remote store. (#9111)

* Remove divergent commit logic with segment replication.

This change removes divergent commit paths for segrep node-node and remote store.
All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file.
This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before
an engine is opened.
This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more NRTReplicationEngineTests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more shard level tests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add test ensuring commits are cleaned up on replicas.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Self review.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Use refresh level sync before recovery

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
(cherry picked from commit c301544)

* Fix test SegmentReplicationIndexShardTests.testPrimaryRestart.

This test is specific to remote store and should not be run for node-node replication.

Signed-off-by: Marc Handalian <handalm@amazon.com>
(cherry picked from commit a33f67e)
Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
@gbbafna
Copy link
Collaborator

gbbafna commented Aug 14, 2023

@mch2 : I saw this failure in relocation tests failures as well . Will create an issue and take a look .

linuxpi pushed a commit to linuxpi/OpenSearch that referenced this pull request Aug 14, 2023
…with remote store. (opensearch-project#9111)

* Remove divergent commit logic with segment replication.

This change removes divergent commit paths for segrep node-node and remote store.
All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file.
This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before
an engine is opened.
This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more NRTReplicationEngineTests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more shard level tests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add test ensuring commits are cleaned up on replicas.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Self review.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Use refresh level sync before recovery

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
linuxpi pushed a commit to linuxpi/OpenSearch that referenced this pull request Aug 16, 2023
…with remote store. (opensearch-project#9111)

* Remove divergent commit logic with segment replication.

This change removes divergent commit paths for segrep node-node and remote store.
All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file.
This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before
an engine is opened.
This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more NRTReplicationEngineTests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more shard level tests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add test ensuring commits are cleaned up on replicas.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Self review.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Use refresh level sync before recovery

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…with remote store. (opensearch-project#9111)

* Remove divergent commit logic with segment replication.

This change removes divergent commit paths for segrep node-node and remote store.
All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file.
This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before
an engine is opened.
This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more NRTReplicationEngineTests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more shard level tests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add test ensuring commits are cleaned up on replicas.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Self review.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Use refresh level sync before recovery

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…with remote store. (opensearch-project#9111)

* Remove divergent commit logic with segment replication.

This change removes divergent commit paths for segrep node-node and remote store.
All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file.
This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before
an engine is opened.
This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more NRTReplicationEngineTests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more shard level tests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add test ensuring commits are cleaned up on replicas.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Self review.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Use refresh level sync before recovery

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…with remote store. (opensearch-project#9111)

* Remove divergent commit logic with segment replication.

This change removes divergent commit paths for segrep node-node and remote store.
All replicas with segrep enabled will perform local commits and ignore any incoming segments_n file.
This changes the recovery sync with remote store to also exclude the segments_n so that only the fetched infos bytes are committed before
an engine is opened.
This change also updates deletion logic with segment replication to automatically delete when a file is decref'd to 0.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more NRTReplicationEngineTests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Ensure old commit files are wiped on remote store sync before we commit a new segmentInfos.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add more shard level tests.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Add test ensuring commits are cleaned up on replicas.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Self review.

Signed-off-by: Marc Handalian <handalm@amazon.com>

* Use refresh level sync before recovery

Signed-off-by: Marc Handalian <handalm@amazon.com>

* PR feedback.

Signed-off-by: Marc Handalian <handalm@amazon.com>

---------

Signed-off-by: Marc Handalian <handalm@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog :test Adding or fixing a test
Projects
None yet
5 participants