-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Optimization] Cluster State Update Optimization #7853
[Optimization] Cluster State Update Optimization #7853
Conversation
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
… lookups Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sandeshkr419 for working on the POC to gauge the impact of the existing re-computation. I've left few questions, ideally want to understand the actual logic for skipping computation.
server/src/main/java/org/opensearch/cluster/metadata/Metadata.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/metadata/Metadata.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/metadata/Metadata.java
Outdated
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Hey @sandeshkr419 noticing lot gradle check errors, are you able to re-produce this locally ? |
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
@prudhvigodithi Yes, I was able to reproduce and fix majority of these failures except these in latest CI: org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test - This is failing on my local on main branch (with latest changes pulled) as well using JDK 19 & 20 both. Tried multiple times, to omit chances of flakiness with no luck. Test/system config:
org.opensearch.search.backpressure.SearchBackpressureIT.testSearchTaskCancellationWithHighCpu - This is successful in my local - ran using JDK 19 & JDK 20 - probably a network issue causing it to fail in CI - as per the logs |
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
@shwetathareja Did you happen to get a chance to review the recent changes after I pulled the PR out of draft mode? |
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Thanks @shwetathareja for approving & merging this. Please add the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-7853-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 cb0d13b9cab950b39269eb28691e6075cd9cf1aa
# Push it to GitHub
git push --set-upstream origin backport/backport-7853-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
…7853) * Cluster State Update Optimization - Optimize Metadata build() to skip redundant computations of indicesLookup as part of ClusterState build Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> (cherry picked from commit cb0d13b)
Manually backported to resolve conflicts in CHANGELOG (no other conflicts): #8644 |
…7853) * Cluster State Update Optimization - Optimize Metadata build() to skip redundant computations of indicesLookup as part of ClusterState build Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
* Cluster State Update Optimization - Optimize Metadata build() to skip redundant computations of indicesLookup as part of ClusterState build Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> (cherry picked from commit cb0d13b)
…7853) * Cluster State Update Optimization - Optimize Metadata build() to skip redundant computations of indicesLookup as part of ClusterState build Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
…7853) * Cluster State Update Optimization - Optimize Metadata build() to skip redundant computations of indicesLookup as part of ClusterState build Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
…7853) * Cluster State Update Optimization - Optimize Metadata build() to skip redundant computations of indicesLookup as part of ClusterState build Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> Signed-off-by: sahil buddharaju <sahilbud@amazon.com>
…7853) * Cluster State Update Optimization - Optimize Metadata build() to skip redundant computations of indicesLookup as part of ClusterState build Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
…7853) * Cluster State Update Optimization - Optimize Metadata build() to skip redundant computations of indicesLookup as part of ClusterState build Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> --------- Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
This
draftPR is to discuss the optimization changes for ClusterState computation by limiting the number of times objects related to indices and its lookups inside Metadata.java.Will add relevant test cases, improve reduce excessive logging, modify Changelog in later commits.The build() in Metadata.java is expensive operation because of computation of indicesLookup.
There are certain places where we know for sure that that computation of all these values is not necessary. For instance, when a template is modified, all the indices related objects are not modified; in MasterService, where after computation of Metadata, version is increment only.
Note when creating a new ClusterState, Metadata object is created twice. So with this changes, we are basically omitting the entire Metadata creation second time, which alone saves ~40% time for ClusterState computation.
This is done by introducing variables in Builder method to hold local copies of metadata related objects.
Additionally, a new variableThe objects which are recomputed inrebuildIndicesLookups
can be set to false (default will be true to retain original workflow) with which you manually skip re-creation of objects related to indicesLookups (open, close, visible, hidden, indicesLookup map).build()
method can skipped and utilized directly from copied objects (from last metadata) ifindices
andcustoms
are unchanged. Checking the change in these 2 objects is less time consuming operation.The idea is that in these scenarios, we can save ~40% time taken to create ClusterState in all API calls, which modify cluster state. For certain API calls, like master re-election, template related APIs, we should be able to gain 95% time required for ClusterState computation.
Is it possible to setrebuildIndicesLookups
to false implicitly (by deciding from indices, etc) instead of explicit sertting?The complication in doing so is that there are multiple objects which inherently depend on IndexMetadata/IndexAbstraction. This implicit logic will have to be driven out be checking and analyzing any values (open, close, hidden, etc) that can be changed. The effort is not worth the gain and can be erroneous to start with. By introducing the new flag variable, we only modify the work flows where we know for sure that Metadata is not changed. The default behavior will still stay unchanged - Metadata will be computed from scratch as before for all the workflows.
Changes noted on:
Before (new alias creation):
After (new alias creation):
For Template Creation, the indicesLookup is not computed even once, so the time reduces to by ~95% since only deep copy is done from previous cluster state for indices related objects.
Note: The time taken for the above operations are usually of higher magnitude (2-3x) when run on smaller instance types, such as m4.large, c4.large.
Related Issues
Resolves #7002
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.