-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Search only replicas (scale to zero) with Reader/Writer Separation #17299
base: main
Are you sure you want to change the base?
[WIP] Search only replicas (scale to zero) with Reader/Writer Separation #17299
Conversation
Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>
Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>
Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>
❌ Gradle check result for e89b812: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
While I refactor the code and add additional tests, I’m creating this PR to gather early feedback please take a look and add your thoughts. I will share the testing results in the comments. Thanks! |
❌ Gradle check result for 1bd7c6a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
I went through and tested the following scenariosScenario 1: Search-Only Replicas Recovery with Persistent Data Directory and when
|
Coming from #17299 (comment) @shwetathareja can you please go over scenario 2 and 3 and if it make sense. I wanted to understand why |
8f1d4ea
to
7fa5133
Compare
❌ Gradle check result for 7fa5133: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
7fa5133
to
64bb954
Compare
I have updated the PR to adjust the cluster health configuration using only search replicas and to incorporate the changes made when |
❌ Gradle check result for 64bb954: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
64bb954
to
470c0ea
Compare
❌ Gradle check result for 470c0ea: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Adding @sachinpkale can you please take a look at this comment #17299 (comment) and provide your thoughts to understand why
I dint get much info from the docs https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote-store/index/#restoring-from-a-backup. |
|
||
@Override | ||
public List<Route> routes() { | ||
return asList(new Route(POST, "/{index}/_searchonly/enable"), new Route(POST, "/{index}/_searchonly/disable")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename _searchonly
better to have a verb instead to denote an action on an index like _scale
and use search-only
as a query parameter/request body to ensure the API finds wider applicability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks will take a look at this to go with a generic and which has a wider applicability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I started with _scale
#16720 (comment). May be we can have ?
POST /{index}/_scale
{
"search-only": true
}
Adding @msfroh @mch2 @getsaurabh02
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per original discussion @prudhvigodithi _scale is more intuitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I have updated to use _scale
, example
curl -X POST "http://localhost:9200/my-index/_scale" \
-H "Content-Type: application/json" \
-d '{
"search_only": true
}'
6b8e897
to
b73bb5d
Compare
❌ Gradle check result for b73bb5d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
b73bb5d
to
f8abab4
Compare
❌ Gradle check result for f8abab4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
f8abab4
to
fe2d658
Compare
❌ Gradle check result for fe2d658: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>
fe2d658
to
97b4d0e
Compare
❌ Gradle check result for 97b4d0e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 97b4d0e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 97b4d0e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 0e775c4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Curious what should be the expectation when the search only replica goes down, shouldn't that be re-hydrated from remote by default? |
Hey @Bukhtawar The search replicas start by Now for default (when search_only is not enabled) the behavior is not changed and can be restored from Also we can always disable (brings back the initial state of the index) and enable the search_only which will reinitialize the replicas back again. Adding @mch2 to provide any details If I'm missing. Thanks |
0e775c4
to
b9aafc1
Compare
❌ Gradle check result for b9aafc1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
ed67958
to
0005d0e
Compare
❌ Gradle check result for 0005d0e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>
0005d0e
to
62d23ab
Compare
❕ Gradle check result for 62d23ab: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Description
The primary goal is to allow users to designate an index as search-only allowing only to have the search only replicas running when enabled via an API call
_searchonly/enable
(can be disabled as_searchonly/disable
).With
_searchonly/enable
for an index the process has Two-Phase Scale-Down with a temporary block for the duration of the scale-down operation and then explicitly replace it with a permanent block once all prerequisites (e.g., shard sync, flush, metadata updates) have been met.Eliminates the need for users to manually invoke the
_remotestore/_restore
API to recover search-only replicas with_searchonly/enable
, has automatic recovery of search-only replicas from the remote store during cluster recovery. Teh default behavior is still honored in normal conditions https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote-store/index/#restoring-from-a-backup.To Do (Work on the cluster health, coming from [META] Reader/Writer Separation #15306 (comment) add this similar implementation)
Related Issues
#16720 and part of #15306
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.