You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A placeholder for figuring out how to correctly handle back pressure w/ Service. I have attached notes I wrote a while ago, but my thoughts have evolved since. I haven't written them down yet.
Service back pressure
Basic options
Option 1: Use response futures
In this strategy, there is no back pressure strategy built into a service implementation. Each service implementation always accepts a request and starts processing the request, queuing if necessary when upstream resources are not available.
The assumption is that, the caller of service (tokio-proto) will maintain a maximum number of outstanding futures for a given connection. For example, a common number for an HTTP/2.0 implementation is 100 in-flight requests.
The problem here is that this strategy only really works if each connection is independent. If all connections require access to a global resource, say a global queue that has the buffer set to 1,000 and there are 100k open connections, the number of outstanding requests will start to backup heavily. In this case, 10 open connections are able to fill the global queue, causing 999,990 other connections to effectively be able to buffer 100 requests each.
The ideal situation here is that, as the global queue becomes full, tokio-proto stops accepting new connections.
Option 2: AsyncService
This strategy modifies Service::call to align more with the Sink API. If a service is not ready to accept new connections, a call would return immediately with AsyncService::NotReady(request). When the service becomes ready, the task is notified and the request can be attempted again. When tokio-proto and other callers of Service receive an AsyncService::NotReady(req) the common response will be to buffer the returned request until the service becomes ready again and to not generate any further requests until that buffer slot is cleared. In rarer cases, it may be possible to fail over to a secondary service.
This strategy creates significant complexity for all implementations of Service. First, it exposes AsyncService and requires understanding that concept, then every middleware needs to be able to return the original request if the upstream is not ready.
If the middleware mutates the request in such a way that is not reversible and the upstream returns AsyncService::NotReady, then the middleware has no choice but to error the request, which kind of defeats the purpose.
Also, a service must be able to determine if the request can be processed immediately, which can cause additional complexity in the case where a service implementation does something like:
It is unclear how to handle upstream_b.call returning AsyncService::NotReady since the original call has already returned. The only solution that I can think of is to buffer resp_a and set a flag on the middleware to not accept any more requests.
AsyncService does not handle the “router” problem where one route may be ready but another one is not. If a request sent in results in AsyncService::NotReady, the caller will either have to stop all further requests or buffer the requests that are rejected while sending in new requests. This “pending” buffer could be large. Also, for every “tick” of the event loop, the service will need to attempt to flush all buffered requests even if only a single one may be ready (it’s also unclear how the service flushes the buffered requests since there is no “poke”). If a buffer is introduced, it is unclear why the router doesn’t manage the buffer itself w/ the poll_ready strategy. It would be much simpler.
This seems to imply that the right thing to do when a service is overloaded is to resolve the response future as an error. The problem is then, how does the caller know when the service will accept another request?
Option 3: poll_ready
In this strategy, Service has an additional function: poll_ready(). The function returns Async::Ready when the service is ready to accept another request. The caller then sends the request, which returns a response future as it “knows” that the service can accept the request.
Questions would be, is poll_ready() a guarantee or can it return false positives? If it does return a false positive, how does a service respond to a call when there is no availability? I would suggest that this can be left up to the service implementation (maybe it is a configuration setting) and maybe we provide some best practice hints? There really would be two options, either the service accepts the request, buffers it, and returns a response future or the service returns an error future. Both seem like they could be acceptable depending on the situation. Note, that if the service returns an “out of capacity” error, the poll_ready function should provide help to the caller to determine when to resume sending requests.
Another question is how to handle services that are conditionally ready depending on the details of the request. For example, a router may have one route that is ready and another one that isn’t. The way this would work with poll_ready would be that the router would be configured with a max buffer size per route. When a route’s buffer is exceeded, the router can either disable the entire router (poll_ready returns not ready) or it can error further requests on the route and keep the router service “ready”.
I believe that this strategy is simpler and provides the same capabilities as option 2.
Advanced opt-in possibilities
Service specific back pressure strategies could be employed as well. For example, in the router case, when a route is not ready, the router could error the request but provide back pressure information in the error:
In this case, the error includes the route that was unavailable as well as a future representing that specific route becoming available again. The application can then handle the back pressure signal or just ignore the error and the request is aborted.
Another strategy could be having a “state” on the response future:
enumResponseState{Healthy,Distressed,}
Services could then always accept all requests, but if there is a back pressure situation going on, the response future will be in the “distressed” state. The service caller could then decide if it wants to abort the request (drop the response future) or buffer it…
The text was updated successfully, but these errors were encountered:
A placeholder for figuring out how to correctly handle back pressure w/
Service
. I have attached notes I wrote a while ago, but my thoughts have evolved since. I haven't written them down yet.Service back pressure
Basic options
Option 1: Use response futures
In this strategy, there is no back pressure strategy built into a service implementation. Each service implementation always accepts a request and starts processing the request, queuing if necessary when upstream resources are not available.
The assumption is that, the caller of service (tokio-proto) will maintain a maximum number of outstanding futures for a given connection. For example, a common number for an HTTP/2.0 implementation is 100 in-flight requests.
The problem here is that this strategy only really works if each connection is independent. If all connections require access to a global resource, say a global queue that has the buffer set to 1,000 and there are 100k open connections, the number of outstanding requests will start to backup heavily. In this case, 10 open connections are able to fill the global queue, causing 999,990 other connections to effectively be able to buffer 100 requests each.
The ideal situation here is that, as the global queue becomes full, tokio-proto stops accepting new connections.
Option 2: AsyncService
This strategy modifies
Service::call
to align more with theSink
API. If a service is not ready to accept new connections, a call would return immediately withAsyncService::NotReady(request)
. When the service becomes ready, the task is notified and the request can be attempted again. When tokio-proto and other callers ofService
receive an AsyncService::NotReady(req) the common response will be to buffer the returned request until the service becomes ready again and to not generate any further requests until that buffer slot is cleared. In rarer cases, it may be possible to fail over to a secondary service.This strategy creates significant complexity for all implementations of Service. First, it exposes AsyncService and requires understanding that concept, then every middleware needs to be able to return the original request if the upstream is not ready.
BEFORE:
AFTER
If the middleware mutates the request in such a way that is not reversible and the upstream returns AsyncService::NotReady, then the middleware has no choice but to error the request, which kind of defeats the purpose.
Also, a service must be able to determine if the request can be processed immediately, which can cause additional complexity in the case where a service implementation does something like:
It is unclear how to handle upstream_b.call returning AsyncService::NotReady since the original call has already returned. The only solution that I can think of is to buffer
resp_a
and set a flag on the middleware to not accept any more requests.AsyncService does not handle the “router” problem where one route may be ready but another one is not. If a request sent in results in AsyncService::NotReady, the caller will either have to stop all further requests or buffer the requests that are rejected while sending in new requests. This “pending” buffer could be large. Also, for every “tick” of the event loop, the service will need to attempt to flush all buffered requests even if only a single one may be ready (it’s also unclear how the service flushes the buffered requests since there is no “poke”). If a buffer is introduced, it is unclear why the router doesn’t manage the buffer itself w/ the
poll_ready
strategy. It would be much simpler.This seems to imply that the right thing to do when a service is overloaded is to resolve the response future as an error. The problem is then, how does the caller know when the service will accept another request?
Option 3: poll_ready
In this strategy,
Service
has an additional function:poll_ready()
. The function returns Async::Ready when the service is ready to accept another request. The caller then sends the request, which returns a response future as it “knows” that the service can accept the request.Questions would be, is poll_ready() a guarantee or can it return false positives? If it does return a false positive, how does a service respond to a call when there is no availability? I would suggest that this can be left up to the service implementation (maybe it is a configuration setting) and maybe we provide some best practice hints? There really would be two options, either the service accepts the request, buffers it, and returns a response future or the service returns an error future. Both seem like they could be acceptable depending on the situation. Note, that if the service returns an “out of capacity” error, the poll_ready function should provide help to the caller to determine when to resume sending requests.
Another question is how to handle services that are conditionally ready depending on the details of the request. For example, a router may have one route that is ready and another one that isn’t. The way this would work with poll_ready would be that the router would be configured with a max buffer size per route. When a route’s buffer is exceeded, the router can either disable the entire router (poll_ready returns not ready) or it can error further requests on the route and keep the router service “ready”.
I believe that this strategy is simpler and provides the same capabilities as option 2.
Advanced opt-in possibilities
Service specific back pressure strategies could be employed as well. For example, in the router case, when a route is not ready, the router could error the request but provide back pressure information in the error:
In this case, the error includes the route that was unavailable as well as a future representing that specific route becoming available again. The application can then handle the back pressure signal or just ignore the error and the request is aborted.
Another strategy could be having a “state” on the response future:
Services could then always accept all requests, but if there is a back pressure situation going on, the response future will be in the “distressed” state. The service caller could then decide if it wants to abort the request (drop the response future) or buffer it…
The text was updated successfully, but these errors were encountered: