Handling passive health check on multiple instances of Yarp #2405

jayendranarumugam · 2024-02-16T08:52:26Z

I'm following this code, where its been implemented by the Yarp.

public class ThrottlingHealthPolicy : IPassiveHealthCheckPolicy
{
    public static string ThrottlingPolicyName = "ThrottlingPolicy";
    private readonly IDestinationHealthUpdater _healthUpdater;

    public ThrottlingHealthPolicy(IDestinationHealthUpdater healthUpdater)
    {
        _healthUpdater = healthUpdater;
    }

    public string Name => ThrottlingPolicyName;

    public void RequestProxied(HttpContext context, ClusterState cluster, DestinationState destination)
    {
        var headers = context.Response.Headers;

        if (context.Response.StatusCode is 429 or >= 500)
        {
            var retryAfterSeconds = 10;

            if (headers.TryGetValue("Retry-After", out var retryAfterHeader) && retryAfterHeader.Count > 0 && int.TryParse(retryAfterHeader[0], out var retryAfter))
            {
                retryAfterSeconds = retryAfter;
            }
            else
            if (headers.TryGetValue("x-ratelimit-reset-requests", out var ratelimiResetRequests) && ratelimiResetRequests.Count > 0 && int.TryParse(ratelimiResetRequests[0], out var ratelimiResetRequest))
            {
                retryAfterSeconds = ratelimiResetRequest;
            }
            else
            if (headers.TryGetValue("x-ratelimit-reset-tokens", out var ratelimitResetTokens) && ratelimitResetTokens.Count > 0 && int.TryParse(ratelimitResetTokens[0], out var ratelimitResetToken))
            {
                retryAfterSeconds = ratelimitResetToken;
            }

            _healthUpdater.SetPassive(cluster, destination, DestinationHealth.Unhealthy, TimeSpan.FromSeconds(retryAfterSeconds));
        }
    }

One of the limitation, is

This solution uses the local memory to store the endpoints health state. That means each instance will have its own view of the throttling state of each OpenAI endpoint. What might happen during runtime is this:

Instance 1 receives a customer request and gets a 429 error from backend 1. It marks that backend as unavailable for X seconds and then reroute that customer request to next backend
Instance 2 receives a customer request and sends that request again to backend 1 (since its local cached list of backends didn't have the information from instance 1 when it marked as throttled). Backend 1 will respond with error 429 again and instance 2 will also mark it as unavailable and reroutes the request to next backend

Question:

Is there any other option to use for storing this endpoint health state in a centralized zone/place instead of local memory which may not work for multiple instances of Yarp?

jayendranarumugam added the Type: Idea This issue is a high-level idea for discussion. label Feb 16, 2024

MihaZupan self-assigned this Mar 21, 2024

MihaZupan added this to the Backlog milestone Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling passive health check on multiple instances of Yarp #2405

Handling passive health check on multiple instances of Yarp #2405

jayendranarumugam commented Feb 16, 2024

Handling passive health check on multiple instances of Yarp #2405

Handling passive health check on multiple instances of Yarp #2405

Comments

jayendranarumugam commented Feb 16, 2024

Question: