[Feature] Support for weighted zonal search request routing policy #2859

Bukhtawar · 2022-04-11T18:47:00Z

Is your feature request related to a problem? Please describe.
The search requests at the coordinator performs a round robin to route requests to shard copies, with adaptive replica selection it might choose to route request to copies that rank lower in preference based on certain parameters. However there seems to be some use cases where weighted routing policy adds value

Heterogeneous(zone wise) instance types -- Certain instance capacities are sometimes available only in certain zones and not others, which means customers can choose to run 4xl instances in 1 zone and 2xl instances in others. Weighted routing, 2:1 might help heterogeneous deplyoments.
Zonal deployment model --Software Deployments can go slow and might choose to perform 1 zone deployment at a time, which means cutting off traffic to one zone under deployment might be needed. Setting the policy to 1:0 should effectively cut-off all search shard requests to go to copies in the AZ under deployment
Zonal failures -- Zonal failures are common and there is no mechanism to weigh away shard request traffic off unhealthy zone, even though HTTP traffic is weighted away

Describe the solution you'd like
Support for a weighted routing policy that can help to incrementally weigh away traffic or route traffic based on the routing policy. To start off small we can have a manual mechanism to configure policies and provide smarter defaults and guard rails to prevent acting up on bad configurations.

dblock · 2022-04-11T19:00:34Z

If I read this correctly, you're trying to solve for availability scenarios with AZ failures by using weighted routing. If that's the case, you might want to clarify above (maybe explain where you come from a little better?). Then, does it ever make sense to have heterogeneous instances within the same zone? in which case weighted routing may also be a good idea for better throughput and not just availability?

Bukhtawar · 2022-04-11T20:08:15Z

If I read this correctly, you're trying to solve for availability scenarios with AZ failures by using weighted routing. If that's the case, you might want to clarify above (maybe explain where you come from a little better?). Then, does it ever make sense to have heterogeneous instances within the same zone? in which case weighted routing may also be a good idea for better throughput and not just availability?

Modified the issue to reflect zonal routing policy.

dblock · 2022-04-11T23:08:31Z

So is the answer to does it ever make sense to have heterogeneous instances within the same AZ a no?

Bukhtawar · 2022-04-12T08:36:25Z

It does make sense, but do you think we can build that capability incrementally. While building the zonal policy, we should see how it could be extended for these uses cases in future as well

dblock · 2022-04-13T18:40:14Z

Take a look at #2877 (comment), can we solve both this and that problem the same way or in the same path?

andrross · 2022-04-14T17:01:20Z

I agree with @dblock that there may be a common mechanism here that would solve many different use cases. Just for my own clarification though I have a couple questions :)

re: zonal failures - Are you referring to partial failures here where hosts in the failed zone are still responsive but have degraded performance? If the failed zone is fully partitioned away from the rest of the cluster and all network connections are broken then would weighting away be necessary?

re: zonal deployment model - Can this be solved by graceful shutdowns during deployments so that new traffic is not accepted and existing requests are allowed to complete? It seems like it would be preferable to solve this in a way that doesn't require the operator to orchestrate weighting policies during deployments, if that's possible.

Bukhtawar · 2022-05-13T17:55:28Z

zonal failures - Are you referring to partial failures here where hosts in the failed zone are still responsive but have degraded performance? If the failed zone is fully partitioned away from the rest of the cluster and all network connections are broken then would weighting away be necessary

Yes weigh away would guarantee that transient network faults don't cause a flip flop till the zonal failure complete heals. For predictability it might be desired we stop routing any traffic to the impacted zone/rack.

zonal deployment model - Can this be solved by graceful shutdowns during deployments so that new traffic is not accepted and existing requests are allowed to complete? It seems like it would be preferable to solve this in a way that doesn't require the operator to orchestrate weighting policies during deployments, if that's possible.

Yes the intent is to introduce graceful shutdowns. While I am not sure what you meant by "orchestrating weighting policies during deployments", the idea would be to allow controls to incrementally(fe 5% -> 20% -> 50% -> 100%) weigh away traffic which might require operator/automated orchestration

elfisher · 2022-07-20T13:47:07Z

Can we add labels for "roadmap" and the version of OpenSearch this is targeting? I can add it to the overall project roadmap in the right column once that is done.

saratvemulapalli · 2023-01-10T06:44:27Z

@Bukhtawar are we good to go for 2.5 ?
Code freeze is tomorrow.

kotwanikunal · 2023-01-11T02:24:15Z

@Bukhtawar are we good to go for 2.5 ? Code freeze is tomorrow.

@Bukhtawar Any updates?

Bukhtawar added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 11, 2022

Bukhtawar changed the title ~~[Feature] Support for weighted shard search request routing policy~~ [Feature] Support for weighted zonal search request routing policy Apr 11, 2022

ryanbogan added distributed framework and removed untriaged labels Apr 12, 2022

dblock mentioned this issue Apr 13, 2022

Support dynamic node role #2877

Closed

Bukhtawar mentioned this issue May 24, 2022

Support for decommissioning and recommissioning a zone #3402

Open

imRishN mentioned this issue Jun 21, 2022

[RFC] API for decommissioning/recommissioning zone and weighted zonal search request routing policy #3639

Closed

anshu1106 mentioned this issue Jun 29, 2022

[Draft] Weighted Round Robin policy for shard coordination traffic routing #3738

Closed

5 tasks

pranikum mentioned this issue Jul 18, 2022

Need Support for multi node cluster setup on local. #3933

Open

This was referenced Aug 17, 2022

Weighted round-robin scheduling policy for shard coordination traffic… #4241

Merged

Add PUT api to update shard routing weights #4272

Merged

Add GET api to get shard routing weights #4275

Merged

anshu1106 mentioned this issue Sep 3, 2022

Delete API for weighted round robin search routing #4400

Merged

6 tasks

anshu1106 mentioned this issue Sep 15, 2022

[META] Weighted Routing Issue List #4526

Open

20 tasks

anshu1106 mentioned this issue Oct 11, 2022

[Weighted Shard Routing] Fail open weighed away AZ #4735

Closed

rramachand21 added v2.5.0 'Issues and PRs related to version v2.5.0' roadmap labels Jan 6, 2023

Bukhtawar closed this as completed Jan 11, 2023

anshu1106 mentioned this issue Jan 2, 2024

[BUG] Deserialization bug in weighted round robin metadata #11697

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support for weighted zonal search request routing policy #2859

[Feature] Support for weighted zonal search request routing policy #2859

Bukhtawar commented Apr 11, 2022 •

edited

Loading

dblock commented Apr 11, 2022

Bukhtawar commented Apr 11, 2022

dblock commented Apr 11, 2022

Bukhtawar commented Apr 12, 2022

dblock commented Apr 13, 2022

andrross commented Apr 14, 2022

Bukhtawar commented May 13, 2022

elfisher commented Jul 20, 2022

saratvemulapalli commented Jan 10, 2023

kotwanikunal commented Jan 11, 2023

[Feature] Support for weighted zonal search request routing policy #2859

[Feature] Support for weighted zonal search request routing policy #2859

Comments

Bukhtawar commented Apr 11, 2022 • edited Loading

dblock commented Apr 11, 2022

Bukhtawar commented Apr 11, 2022

dblock commented Apr 11, 2022

Bukhtawar commented Apr 12, 2022

dblock commented Apr 13, 2022

andrross commented Apr 14, 2022

Bukhtawar commented May 13, 2022

elfisher commented Jul 20, 2022

saratvemulapalli commented Jan 10, 2023

kotwanikunal commented Jan 11, 2023

Bukhtawar commented Apr 11, 2022 •

edited

Loading