You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm following this code, where its been implemented by the Yarp.
publicclassThrottlingHealthPolicy:IPassiveHealthCheckPolicy{publicstaticstringThrottlingPolicyName="ThrottlingPolicy";privatereadonlyIDestinationHealthUpdater_healthUpdater;publicThrottlingHealthPolicy(IDestinationHealthUpdaterhealthUpdater){_healthUpdater=healthUpdater;}publicstringName=>ThrottlingPolicyName;publicvoidRequestProxied(HttpContextcontext,ClusterStatecluster,DestinationStatedestination){varheaders=context.Response.Headers;if(context.Response.StatusCodeis429 or >=500){varretryAfterSeconds=10;if(headers.TryGetValue("Retry-After",outvarretryAfterHeader)&&retryAfterHeader.Count>0&&int.TryParse(retryAfterHeader[0],outvarretryAfter)){retryAfterSeconds=retryAfter;}elseif(headers.TryGetValue("x-ratelimit-reset-requests",outvarratelimiResetRequests)&&ratelimiResetRequests.Count>0&&int.TryParse(ratelimiResetRequests[0],outvarratelimiResetRequest)){retryAfterSeconds=ratelimiResetRequest;}elseif(headers.TryGetValue("x-ratelimit-reset-tokens",outvarratelimitResetTokens)&&ratelimitResetTokens.Count>0&&int.TryParse(ratelimitResetTokens[0],outvarratelimitResetToken)){retryAfterSeconds=ratelimitResetToken;}_healthUpdater.SetPassive(cluster,destination,DestinationHealth.Unhealthy,TimeSpan.FromSeconds(retryAfterSeconds));}}
One of the limitation, is
This solution uses the local memory to store the endpoints health state. That means each instance will have its own view of the throttling state of each OpenAI endpoint. What might happen during runtime is this:
Instance 1 receives a customer request and gets a 429 error from backend 1. It marks that backend as unavailable for X seconds and then reroute that customer request to next backend
Instance 2 receives a customer request and sends that request again to backend 1 (since its local cached list of backends didn't have the information from instance 1 when it marked as throttled). Backend 1 will respond with error 429 again and instance 2 will also mark it as unavailable and reroutes the request to next backend
Question:
Is there any other option to use for storing this endpoint health state in a centralized zone/place instead of local memory which may not work for multiple instances of Yarp?
The text was updated successfully, but these errors were encountered:
I'm following this code, where its been implemented by the Yarp.
One of the limitation, is
Question:
Is there any other option to use for storing this endpoint health state in a centralized zone/place instead of local memory which may not work for multiple instances of Yarp?
The text was updated successfully, but these errors were encountered: