-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support for weighted zonal search request routing policy #2859
Comments
If I read this correctly, you're trying to solve for availability scenarios with AZ failures by using weighted routing. If that's the case, you might want to clarify above (maybe explain where you come from a little better?). Then, does it ever make sense to have heterogeneous instances within the same zone? in which case weighted routing may also be a good idea for better throughput and not just availability? |
Modified the issue to reflect zonal routing policy. |
So is the answer to does it ever make sense to have heterogeneous instances within the same AZ a no? |
It does make sense, but do you think we can build that capability incrementally. While building the zonal policy, we should see how it could be extended for these uses cases in future as well |
Take a look at #2877 (comment), can we solve both this and that problem the same way or in the same path? |
I agree with @dblock that there may be a common mechanism here that would solve many different use cases. Just for my own clarification though I have a couple questions :) re: zonal failures - Are you referring to partial failures here where hosts in the failed zone are still responsive but have degraded performance? If the failed zone is fully partitioned away from the rest of the cluster and all network connections are broken then would weighting away be necessary? re: zonal deployment model - Can this be solved by graceful shutdowns during deployments so that new traffic is not accepted and existing requests are allowed to complete? It seems like it would be preferable to solve this in a way that doesn't require the operator to orchestrate weighting policies during deployments, if that's possible. |
Yes weigh away would guarantee that transient network faults don't cause a flip flop till the zonal failure complete heals. For predictability it might be desired we stop routing any traffic to the impacted zone/rack.
Yes the intent is to introduce graceful shutdowns. While I am not sure what you meant by "orchestrating weighting policies during deployments", the idea would be to allow controls to incrementally(fe 5% -> 20% -> 50% -> 100%) weigh away traffic which might require operator/automated orchestration |
Can we add labels for "roadmap" and the version of OpenSearch this is targeting? I can add it to the overall project roadmap in the right column once that is done. |
@Bukhtawar are we good to go for 2.5 ? |
@Bukhtawar Any updates? |
Is your feature request related to a problem? Please describe.
The search requests at the coordinator performs a round robin to route requests to shard copies, with adaptive replica selection it might choose to route request to copies that rank lower in preference based on certain parameters. However there seems to be some use cases where weighted routing policy adds value
Describe the solution you'd like
Support for a weighted routing policy that can help to incrementally weigh away traffic or route traffic based on the routing policy. To start off small we can have a manual mechanism to configure policies and provide smarter defaults and guard rails to prevent acting up on bad configurations.
The text was updated successfully, but these errors were encountered: