[WIP] Search only replicas (scale to zero) with Reader/Writer Separation #17299

prudhvigodithi · 2025-02-07T17:32:44Z

Description

The primary goal is to allow users to designate an index as search-only allowing only to have the search only replicas running when enabled via an API call _searchonly/enable (can be disabled as _searchonly/disable).
With _searchonly/enable for an index the process has Two-Phase Scale-Down with a temporary block for the duration of the scale-down operation and then explicitly replace it with a permanent block once all prerequisites (e.g., shard sync, flush, metadata updates) have been met.
Eliminates the need for users to manually invoke the _remotestore/_restore API to recover search-only replicas with _searchonly/enable, has automatic recovery of search-only replicas from the remote store during cluster recovery. Teh default behavior is still honored in normal conditions https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote-store/index/#restoring-from-a-backup.
To Do (Work on the cluster health, coming from [META] Reader/Writer Separation #15306 (comment) add this similar implementation)

Related Issues

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>

github-actions · 2025-02-07T17:41:06Z

❌ Gradle check result for e89b812: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

prudhvigodithi · 2025-02-07T17:44:19Z

While I refactor the code and add additional tests, I’m creating this PR to gather early feedback please take a look and add your thoughts. I will share the testing results in the comments. Thanks!
@mch2 @shwetathareja @msfroh @getsaurabh02

github-actions · 2025-02-07T17:54:10Z

❌ Gradle check result for 1bd7c6a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

prudhvigodithi · 2025-02-07T19:51:23Z

I went through and tested the following scenarios

Scenario 1: Search-Only Replicas Recovery with Persistent Data Directory and when `cluster.remote_store.state.enabled` is set to false

With the following settings, OpenSearch was started using:

./gradlew clean run -PnumNodes=6 --data-dir=/tmp/foo

OpenSearch settings

    
setting 'path.repo', '["/tmp/my-repo"]'
setting 'opensearch.experimental.feature.read.write.split.enabled', 'true'
setting 'node.attr.remote_store.segment.repository', 'my-repository'
setting 'node.attr.remote_store.translog.repository', 'my-repository'
setting 'node.attr.remote_store.repository.my-repository.type', 'fs'
setting 'node.attr.remote_store.state.repository', 'my-repository'
setting 'node.attr.remote_store.repository.my-repository.settings.location', '/tmp/my-repo'

Shard Allocation Before Recovery

curl -X GET "localhost:9200/_cat/shards/my-index?v&h=index,shard,prirep,state,unassigned.reason,node,searchOnly"

index    shard prirep state   unassigned.reason node
my-index 0     p      STARTED                   runTask-0
my-index 0     s      STARTED                   runTask-4
my-index 0     r      STARTED                   runTask-2
my-index 1     p      STARTED                   runTask-3
my-index 1     r      STARTED                   runTask-1
my-index 1     s      STARTED                   runTask-5

On restart (terminate the process) everything comes back as running. With search only enabled (/_searchonly/enable) after restart only search replicas are up as running and works as expected.

curl -X GET "localhost:9200/_cat/shards/my-index?v&h=index,shard,prirep,state,unassigned.reason,node,searchOnly"
index    shard prirep state   unassigned.reason node
my-index 0     s      STARTED                   runTask-2
my-index 1     s      STARTED                   runTask-1

Scenario 2: No Data Directory Preservation and when `cluster.remote_store.state.enabled` is set t o false – Index Lost After process Restart (Recovery)

In this scenario, OpenSearch is started without preserving the data directory, meaning that all local shard data is lost upon Recovery.

./gradlew clean run -PnumNodes=6

OpenSearch settings

    
setting 'path.repo', '["/tmp/my-repo"]'
setting 'opensearch.experimental.feature.read.write.split.enabled', 'true'
setting 'node.attr.remote_store.segment.repository', 'my-repository'
setting 'node.attr.remote_store.translog.repository', 'my-repository'
setting 'node.attr.remote_store.repository.my-repository.type', 'fs'
setting 'node.attr.remote_store.state.repository', 'my-repository'
setting 'node.attr.remote_store.repository.my-repository.settings.location', '/tmp/my-repo'

Behavior After Recovery:

Upon terminating the process and restarting OpenSearch, the index is completely lost.
Any attempt to retrieve the shard state results in an index not found exception.

{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [my-index]","index":"my-index","resource.id":"my-index","resource.type":"index_or_alias","index_uuid":"_na_"}],"type":"index_not_found_exception","reason":"no such index [my-index]","index":"my-index","resource.id":"my-index","resource.type":"index_or_alias","index_uuid":"_na_"},"status":404}%

Even With Remote Restore _remotestore/_restore?restore_all_shards=true, index remains unavailable.
Even after recreating the index manually and attempting to restore, documents do not get picked up.
Since during testing --data-dir was not used, local data (including cluster metadata) is wiped on recovery.
Because the cluster state is lost, OpenSearch no longer has any reference to index.

Scenario 3: Cluster Remote Store State Enabled (`cluster.remote_store.state.enabled` is set to true and with no persistent data directory) – Primary Shards Remain Unassigned After Recovery.

./gradlew clean run -PnumNodes=6

OpenSearch settings

    
setting 'path.repo', '["/tmp/my-repo"]'
setting 'opensearch.experimental.feature.read.write.split.enabled', 'true'
setting 'node.attr.remote_store.segment.repository', 'my-repository'
setting 'node.attr.remote_store.translog.repository', 'my-repository'
setting 'node.attr.remote_store.repository.my-repository.type', 'fs'
setting 'node.attr.remote_store.state.repository', 'my-repository'
setting 'node.attr.remote_store.repository.my-repository.settings.location', '/tmp/my-repo'
setting 'cluster.remote_store.state.enabled', 'true'

Shard Allocation After Recovery

curl -X GET "localhost:9200/_cat/shards/my-index?v&h=index,shard,prirep,state,unassigned.reason,node,searchOnly"
index    shard prirep state      unassigned.reason node
my-index 0     p      UNASSIGNED CLUSTER_RECOVERED 
my-index 0     s      UNASSIGNED CLUSTER_RECOVERED 
my-index 0     r      UNASSIGNED CLUSTER_RECOVERED 
my-index 1     p      UNASSIGNED CLUSTER_RECOVERED 
my-index 1     r      UNASSIGNED CLUSTER_RECOVERED 
my-index 1     s      UNASSIGNED CLUSTER_RECOVERED

Issue: Primary Shards Remain Unassigned
Despite cluster.remote_store.state.enabled is true, the primary shards are not automatically assigned after restart ( replicating the recovery). The error message states:

"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster"

Remote store only contains segments and translogs, NOT active shard copies.
Since --data-dir was not used (data directory is not used), local copies of primary shards are lost.
OpenSearch does not automatically restore primaries from the remote store without explicit intervention.

curl -X POST "http://localhost:9200/_remotestore/_restore" -H 'Content-Type: application/json' -d'  
{                
  "indices": ["my-index"]
}
'

However with this PR, when _searchonly is enabled Search-Only Replicas Recover Without a Primary. Since cluster.remote_store.state.enabled is true, OpenSearch remembers the index exists after restart. The allocation logic skips checking for an active primary for search-only replicas.This allows search-only replicas to be assigned to a node, even without an existing primary. However without _searchonly the behavior is same for all replicas, wanted to give an advantage for users with _searchonly enabled indicies, for these indices they should not care _remotestore/_restore as we are not dealing with primaries.
- Search-only replicas can recover automatically from the remote store.
- Search queries remain functional.
- Cluster state correctly remembers the index, but does not bring up primaries as _searchonly is enabled.
The default behavior is OpenSearch does not assume lost primaries should be re-created from remote storage.It waits for explicit user intervention to restore primary shards (_remotestore/_restore). Is this by design ?

Scenario 4: Persistent Data Directory with Remote Store State – Seamless Recovery of Primaries, replicas and search-only replicas

./gradlew clean run -PnumNodes=6 --data-dir=/tmp/foo

OpenSearch settings

    
setting 'path.repo', '["/tmp/my-repo"]'
setting 'opensearch.experimental.feature.read.write.split.enabled', 'true'
setting 'node.attr.remote_store.segment.repository', 'my-repository'
setting 'node.attr.remote_store.translog.repository', 'my-repository'
setting 'node.attr.remote_store.repository.my-repository.type', 'fs'
setting 'node.attr.remote_store.state.repository', 'my-repository'
setting 'node.attr.remote_store.repository.my-repository.settings.location', '/tmp/my-repo'
setting 'cluster.remote_store.state.enabled', 'true'

Upon recovery (no intervention is required )

curl -X GET "localhost:9200/_cat/shards/my-index?v&h=index,shard,prirep,state,unassigned.reason,node,searchOnly"
index    shard prirep state   unassigned.reason node
my-index 0     p      STARTED                   runTask-0
my-index 0     r      STARTED                   runTask-2
my-index 0     s      STARTED                   runTask-4
my-index 1     p      STARTED                   runTask-5
my-index 1     r      STARTED                   runTask-3
my-index 1     s      STARTED                   runTask-1

All primary and replica shards successfully recover since the cluster metadata is retained in the persistent data directory.

If search-only mode is enabled to index, OpenSearch correctly brings up only search replicas while removing primary and regular replicas.

curl -X GET "localhost:9200/_cat/shards/my-index?v&h=index,shard,prirep,state,unassigned.reason,node,searchOnly"
index    shard prirep state   unassigned.reason node
my-index 0     s      STARTED                   runTask-3
my-index 1     s      STARTED                   runTask-3

Only search replicas (SORs) are restored, as expected.

prudhvigodithi · 2025-02-07T19:53:53Z

Coming from #17299 (comment) @shwetathareja can you please go over scenario 2 and 3 and if it make sense. I wanted to understand why _remotestore/_restore is required in these scenarios and I wanted to give advantage for users ti remove this intervention for search only indices.
Thanks
@mch2

github-actions · 2025-02-07T23:44:34Z

❌ Gradle check result for 7fa5133: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

prudhvigodithi · 2025-02-08T00:11:59Z

I have updated the PR to adjust the cluster health configuration using only search replicas and to incorporate the changes made when _searchonly is enabled, the change is not too big hence going with the same PR.

github-actions · 2025-02-08T00:22:05Z

❌ Gradle check result for 64bb954: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-02-12T17:06:04Z

❌ Gradle check result for 470c0ea: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

prudhvigodithi · 2025-02-13T02:32:01Z

Adding @sachinpkale can you please take a look at this comment #17299 (comment) and provide your thoughts to understand why _remotestore/_restore is required (Scenario 3 from #17299 (comment)) and why the cluster cannot be auto recovered, is there any strong reason for this manual intervention to run the API?

curl -X GET "localhost:9200/_cat/shards/my-index?v&h=index,shard,prirep,state,unassigned.reason,node,searchOnly"
index    shard prirep state      unassigned.reason node
my-index 0     p      UNASSIGNED CLUSTER_RECOVERED 
my-index 0     s      UNASSIGNED CLUSTER_RECOVERED 
my-index 0     r      UNASSIGNED CLUSTER_RECOVERED 
my-index 1     p      UNASSIGNED CLUSTER_RECOVERED 
my-index 1     s      UNASSIGNED CLUSTER_RECOVERED 
my-index 1     r      UNASSIGNED CLUSTER_RECOVERED

I dint get much info from the docs https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote-store/index/#restoring-from-a-backup.

Bukhtawar · 2025-02-13T17:33:50Z

server/src/main/java/org/opensearch/rest/action/admin/indices/RestSearchonlyAction.java

+
+    @Override
+    public List<Route> routes() {
+        return asList(new Route(POST, "/{index}/_searchonly/enable"), new Route(POST, "/{index}/_searchonly/disable"));


I would rename _searchonly better to have a verb instead to denote an action on an index like _scale and use search-only as a query parameter/request body to ensure the API finds wider applicability

Thanks will take a look at this to go with a generic and which has a wider applicability.

Initially I started with _scale #16720 (comment). May be we can have ?

POST /{index}/_scale { "search-only": true }

Adding @msfroh @mch2 @getsaurabh02

as per original discussion @prudhvigodithi _scale is more intuitive

Thanks I have updated to use _scale , example

curl -X POST "http://localhost:9200/my-index/_scale" \ -H "Content-Type: application/json" \ -d '{ "search_only": true }'

github-actions · 2025-02-19T21:52:15Z

❌ Gradle check result for b73bb5d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-02-19T23:39:55Z

❌ Gradle check result for f8abab4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-02-20T23:58:51Z

❌ Gradle check result for fe2d658: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>

github-actions · 2025-02-21T02:11:33Z

❌ Gradle check result for 97b4d0e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-02-21T05:04:54Z

❌ Gradle check result for 97b4d0e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-02-21T06:09:04Z

❌ Gradle check result for 97b4d0e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-02-21T17:18:38Z

❌ Gradle check result for 0e775c4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Bukhtawar · 2025-02-21T17:21:16Z

Thanks @shwetathareja this is more a primary durability concern and to avoid split-brain scenarios, if the original primary is merely partitioned and still accepting writes, automatically creating a new primary from remote store risks silent data loss. Hence, OpenSearch requires explicit user intervention before promoting a new primary. However, search replicas we can move forward with the idea to auto recover as they doesn't accept any writes.

I will update the API code _remotestore/_restore to throw an error something like the following so that it will block the API execution and users and aware about the index in search only mode, this way the API wont interfere with search only mode.
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Skipping _remotestore/_restore for all selected indices as search-only mode is enabled."}],"type":"illegal_argument_exception","reason":"Skipping _remotestore/_restore for all selected indices as search-only mode is enabled."},"status":400}
Thanks @mch2

Curious what should be the expectation when the search only replica goes down, shouldn't that be re-hydrated from remote by default?

prudhvigodithi · 2025-02-21T19:13:25Z

Curious what should be the expectation when the search only replica goes down, shouldn't that be re-hydrated from remote by default?

Hey @Bukhtawar

The search replicas start by EmptyStoreRecoverySource.INSTANCE during recovery, I have added a logic in ReplicaShardAllocator to find the available candidate node for allocation decision. Now by default with EmptyStoreRecoverySource.INSTANCE it load the complete set of segments that make up the shard from remote store and this repeats for every recovery scenario and dont need _remotestore/_restore.

Now for default (when search_only is not enabled) the behavior is not changed and can be restored from _remotestore/_restore (with no persistent data directory) for primary and all replica types. From #17299 (comment) Scenario 3.

Also we can always disable (brings back the initial state of the index) and enable the search_only which will reinitialize the replicas back again.

Adding @mch2 to provide any details If I'm missing.

Thanks

github-actions · 2025-02-21T20:13:23Z

❌ Gradle check result for b9aafc1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2025-02-21T22:00:37Z

❌ Gradle check result for 0005d0e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>

github-actions · 2025-02-21T23:59:28Z

❕ Gradle check result for 62d23ab: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov · 2025-02-22T00:00:20Z

Codecov Report

Attention: Patch coverage is 58.66426% with 229 lines in your changes missing coverage. Please review.

Project coverage is 72.29%. Comparing base (abe2333) to head (62d23ab).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
...es/scale/searchonly/TransportSearchOnlyAction.java	53.84%	58 Missing and 2 partials ⚠️
...n/indices/scale/searchonly/SearchOnlyResponse.java	16.66%	28 Missing and 2 partials ⚠️
...est/action/admin/indices/RestSearchOnlyAction.java	13.04%	20 Missing ⚠️
.../opensearch/cluster/routing/IndexRoutingTable.java	25.00%	15 Missing and 3 partials ⚠️
.../org/opensearch/gateway/ReplicaShardAllocator.java	30.43%	15 Missing and 1 partial ⚠️
...scale/searchonly/SearchOnlyOperationValidator.java	38.88%	9 Missing and 2 partials ⚠️
...arch/index/recovery/RemoteStoreRestoreService.java	0.00%	11 Missing ⚠️
...ndices/scale/searchonly/NodeSearchOnlyRequest.java	28.57%	10 Missing ⚠️
...in/indices/scale/searchonly/SearchOnlyRequest.java	70.96%	6 Missing and 3 partials ⚠️
...ices/scale/searchonly/ShardSearchOnlyResponse.java	47.05%	9 Missing ⚠️
... and 12 more

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #17299      +/-   ##
============================================
- Coverage     72.41%   72.29%   -0.13%     
+ Complexity    65667    65641      -26     
============================================
  Files          5303     5315      +12     
  Lines        304781   305298     +517     
  Branches      44201    44268      +67     
============================================
+ Hits         220709   220712       +3     
- Misses        65959    66421     +462     
- Partials      18113    18165      +52

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

prudhvigodithi added 5 commits January 30, 2025 15:33

Scale to Zero

5c32f43

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>

Scale to zero 2nd interation

1d71948

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>

Merge remote-tracking branch 'upstream/main' into searchonly-2

e82f050

Upstream fetch

e89b812

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>

Merge remote-tracking branch 'upstream/main' into searchonly-2

db5212b

prudhvigodithi requested review from mch2, msfroh and shwetathareja February 7, 2025 17:42

prudhvigodithi force-pushed the searchonly-2 branch 3 times, most recently from 8f1d4ea to 7fa5133 Compare February 7, 2025 23:32

prudhvigodithi force-pushed the searchonly-2 branch from 7fa5133 to 64bb954 Compare February 8, 2025 00:07

prudhvigodithi self-assigned this Feb 10, 2025

prudhvigodithi linked an issue Feb 10, 2025 that may be closed by this pull request

[Feature Request] Scale to Zero with Reader/Writer Separation. #16720

Open

prudhvigodithi force-pushed the searchonly-2 branch from 64bb954 to 470c0ea Compare February 12, 2025 16:54

github-actions bot added enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Search:Performance v3.0.0 Issues and PRs related to version 3.0.0 labels Feb 12, 2025

Bukhtawar reviewed Feb 13, 2025

View reviewed changes

prudhvigodithi force-pushed the searchonly-2 branch from 6b8e897 to b73bb5d Compare February 19, 2025 21:40

prudhvigodithi force-pushed the searchonly-2 branch from b73bb5d to f8abab4 Compare February 19, 2025 23:06

prudhvigodithi force-pushed the searchonly-2 branch from f8abab4 to fe2d658 Compare February 20, 2025 23:42

Upstream fetch

97b4d0e

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>

prudhvigodithi force-pushed the searchonly-2 branch from fe2d658 to 97b4d0e Compare February 21, 2025 01:42

prudhvigodithi mentioned this pull request Feb 21, 2025

[AUTOCUT] Gradle Check Flaky Test Report for MetadataCreateIndexServiceTests #17291

Closed

Merge remote-tracking branch 'upstream/main' into searchonly-2

6e2bb1d

prudhvigodithi force-pushed the searchonly-2 branch from 0e775c4 to b9aafc1 Compare February 21, 2025 19:15

prudhvigodithi force-pushed the searchonly-2 branch 3 times, most recently from ed67958 to 0005d0e Compare February 21, 2025 21:06

Upstream Fetch

62d23ab

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>

prudhvigodithi force-pushed the searchonly-2 branch from 0005d0e to 62d23ab Compare February 21, 2025 23:03

opensearch-ci-bot mentioned this pull request Feb 22, 2025

[AUTOCUT] Gradle Check Flaky Test Report for DedicatedClusterSnapshotRestoreIT #15806

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Search only replicas (scale to zero) with Reader/Writer Separation #17299

[WIP] Search only replicas (scale to zero) with Reader/Writer Separation #17299

prudhvigodithi commented Feb 7, 2025 •

edited

Loading

github-actions bot commented Feb 7, 2025

prudhvigodithi commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

prudhvigodithi commented Feb 7, 2025

prudhvigodithi commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

prudhvigodithi commented Feb 8, 2025

github-actions bot commented Feb 8, 2025

github-actions bot commented Feb 12, 2025

prudhvigodithi commented Feb 13, 2025

Bukhtawar Feb 13, 2025

prudhvigodithi Feb 13, 2025 •

edited

Loading

prudhvigodithi Feb 13, 2025

shwetathareja Feb 14, 2025

prudhvigodithi Feb 18, 2025

github-actions bot commented Feb 19, 2025

github-actions bot commented Feb 19, 2025

github-actions bot commented Feb 20, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

Bukhtawar commented Feb 21, 2025

prudhvigodithi commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

codecov bot commented Feb 22, 2025

[WIP] Search only replicas (scale to zero) with Reader/Writer Separation #17299

Are you sure you want to change the base?

[WIP] Search only replicas (scale to zero) with Reader/Writer Separation #17299

Conversation

prudhvigodithi commented Feb 7, 2025 • edited Loading

Description

Related Issues

Check List

github-actions bot commented Feb 7, 2025

prudhvigodithi commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

prudhvigodithi commented Feb 7, 2025

I went through and tested the following scenarios

Scenario 1: Search-Only Replicas Recovery with Persistent Data Directory and when cluster.remote_store.state.enabled is set to false

Scenario 2: No Data Directory Preservation and when cluster.remote_store.state.enabled is set t o false – Index Lost After process Restart (Recovery)

Behavior After Recovery:

Scenario 3: Cluster Remote Store State Enabled (cluster.remote_store.state.enabled is set to true and with no persistent data directory) – Primary Shards Remain Unassigned After Recovery.

Shard Allocation After Recovery

Scenario 4: Persistent Data Directory with Remote Store State – Seamless Recovery of Primaries, replicas and search-only replicas

Upon recovery (no intervention is required )

prudhvigodithi commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

prudhvigodithi commented Feb 8, 2025

github-actions bot commented Feb 8, 2025

github-actions bot commented Feb 12, 2025

prudhvigodithi commented Feb 13, 2025

Bukhtawar Feb 13, 2025

Choose a reason for hiding this comment

prudhvigodithi Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

prudhvigodithi Feb 13, 2025

Choose a reason for hiding this comment

shwetathareja Feb 14, 2025

Choose a reason for hiding this comment

prudhvigodithi Feb 18, 2025

Choose a reason for hiding this comment

github-actions bot commented Feb 19, 2025

github-actions bot commented Feb 19, 2025

github-actions bot commented Feb 20, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

Bukhtawar commented Feb 21, 2025

prudhvigodithi commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

github-actions bot commented Feb 21, 2025

codecov bot commented Feb 22, 2025

Codecov Report

prudhvigodithi commented Feb 7, 2025 •

edited

Loading

Scenario 1: Search-Only Replicas Recovery with Persistent Data Directory and when `cluster.remote_store.state.enabled` is set to false

Scenario 2: No Data Directory Preservation and when `cluster.remote_store.state.enabled` is set t o false – Index Lost After process Restart (Recovery)

Scenario 3: Cluster Remote Store State Enabled (`cluster.remote_store.state.enabled` is set to true and with no persistent data directory) – Primary Shards Remain Unassigned After Recovery.

prudhvigodithi Feb 13, 2025 •

edited

Loading