Figure out backpressure #3

carllerche · 2017-07-29T03:46:31Z

A placeholder for figuring out how to correctly handle back pressure w/ Service. I have attached notes I wrote a while ago, but my thoughts have evolved since. I haven't written them down yet.

Service back pressure

Basic options

Option 1: Use response futures

In this strategy, there is no back pressure strategy built into a service implementation. Each service implementation always accepts a request and starts processing the request, queuing if necessary when upstream resources are not available.

The assumption is that, the caller of service (tokio-proto) will maintain a maximum number of outstanding futures for a given connection. For example, a common number for an HTTP/2.0 implementation is 100 in-flight requests.

The problem here is that this strategy only really works if each connection is independent. If all connections require access to a global resource, say a global queue that has the buffer set to 1,000 and there are 100k open connections, the number of outstanding requests will start to backup heavily. In this case, 10 open connections are able to fill the global queue, causing 999,990 other connections to effectively be able to buffer 100 requests each.

The ideal situation here is that, as the global queue becomes full, tokio-proto stops accepting new connections.

Option 2: AsyncService

This strategy modifies Service::call to align more with the Sink API. If a service is not ready to accept new connections, a call would return immediately with AsyncService::NotReady(request). When the service becomes ready, the task is notified and the request can be attempted again. When tokio-proto and other callers of Service receive an AsyncService::NotReady(req) the common response will be to buffer the returned request until the service becomes ready again and to not generate any further requests until that buffer slot is cleared. In rarer cases, it may be possible to fail over to a secondary service.

This strategy creates significant complexity for all implementations of Service. First, it exposes AsyncService and requires understanding that concept, then every middleware needs to be able to return the original request if the upstream is not ready.

BEFORE:

impl<T, P> Service for ClientService<T, P> where T: 'static, P: ClientProto<T> {
    type Request = P::Request;
    type Response = P::Response;
    type Error = P::Error;
    type Future = ClientFuture<T, P>;

    fn call(&mut self, req: P::Request) -> Self::Future {
        ClientFuture {
            inner: self.inner.call(Message::WithoutBody(req))
        }
    }
}

AFTER

impl<T, P> Service for ClientService<T, P> where T: 'static, P: ClientProto<T> {
    type Request = P::Request;
    type Response = P::Response;
    type Error = P::Error;
    type Future = ClientFuture<T, P>;

    fn call(&mut self, req: P::Request) -> AsyncService<Self::Future, P::Request> {
        match self.inner.call(Message::WithoutBody(req)) {
            AsyncService::Ready(f) => {
                AsyncService::Ready(ClientFuture {
                    inner: f,
                })
            },
            AsyncService::NotReady(req) => {
                match req {
                    Message::WithoutBody(req) => AsyncService::NotReady(req),
                    _ => panic!("wat"),
                }
            }
        }
    }
}

If the middleware mutates the request in such a way that is not reversible and the upstream returns AsyncService::NotReady, then the middleware has no choice but to error the request, which kind of defeats the purpose.

Also, a service must be able to determine if the request can be processed immediately, which can cause additional complexity in the case where a service implementation does something like:

upstream_a.call(request)
	.and_then(|resp_a| upstream_b.call(resp_a));

It is unclear how to handle upstream_b.call returning AsyncService::NotReady since the original call has already returned. The only solution that I can think of is to buffer resp_a and set a flag on the middleware to not accept any more requests.

AsyncService does not handle the “router” problem where one route may be ready but another one is not. If a request sent in results in AsyncService::NotReady, the caller will either have to stop all further requests or buffer the requests that are rejected while sending in new requests. This “pending” buffer could be large. Also, for every “tick” of the event loop, the service will need to attempt to flush all buffered requests even if only a single one may be ready (it’s also unclear how the service flushes the buffered requests since there is no “poke”). If a buffer is introduced, it is unclear why the router doesn’t manage the buffer itself w/ the poll_ready strategy. It would be much simpler.

This seems to imply that the right thing to do when a service is overloaded is to resolve the response future as an error. The problem is then, how does the caller know when the service will accept another request?

Option 3: poll_ready

In this strategy, Service has an additional function: poll_ready(). The function returns Async::Ready when the service is ready to accept another request. The caller then sends the request, which returns a response future as it “knows” that the service can accept the request.

Questions would be, is poll_ready() a guarantee or can it return false positives? If it does return a false positive, how does a service respond to a call when there is no availability? I would suggest that this can be left up to the service implementation (maybe it is a configuration setting) and maybe we provide some best practice hints? There really would be two options, either the service accepts the request, buffers it, and returns a response future or the service returns an error future. Both seem like they could be acceptable depending on the situation. Note, that if the service returns an “out of capacity” error, the poll_ready function should provide help to the caller to determine when to resume sending requests.

Another question is how to handle services that are conditionally ready depending on the details of the request. For example, a router may have one route that is ready and another one that isn’t. The way this would work with poll_ready would be that the router would be configured with a max buffer size per route. When a route’s buffer is exceeded, the router can either disable the entire router (poll_ready returns not ready) or it can error further requests on the route and keep the router service “ready”.

I believe that this strategy is simpler and provides the same capabilities as option 2.

Advanced opt-in possibilities

Service specific back pressure strategies could be employed as well. For example, in the router case, when a route is not ready, the router could error the request but provide back pressure information in the error:

struct RouteUnavailable<R> {
	request: R,
	route: String,
	ready: impl Future<()>,
}

In this case, the error includes the route that was unavailable as well as a future representing that specific route becoming available again. The application can then handle the back pressure signal or just ignore the error and the request is aborted.

Another strategy could be having a “state” on the response future:

enum ResponseState {
	Healthy,
	Distressed,
}

Services could then always accept all requests, but if there is a back pressure situation going on, the response future will be in the “distressed” state. The service caller could then decide if it wants to abort the request (drop the response future) or buffer it…

The text was updated successfully, but these errors were encountered:

danburkert · 2017-07-30T07:39:32Z

Here are a couple of scenarios I've run into while writing Service implementations:

Pushing on to a queue
Pushing on to one of many queues, e.g., a router service
Pushing on to a sequence of queues

For each of these 'pushing on to a queue' can be replaced with pretty much any operation which needs backpressure.

carllerche mentioned this issue Sep 15, 2017

Add backpresure capabilities to Service #6

Merged

4 tasks

carllerche closed this as completed in #6 Sep 27, 2017

Skepfyr mentioned this issue Jan 9, 2022

Is poll_ready worth it? #626

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out backpressure #3

Figure out backpressure #3

carllerche commented Jul 29, 2017

danburkert commented Jul 30, 2017

Figure out backpressure #3

Figure out backpressure #3

Comments

carllerche commented Jul 29, 2017

Service back pressure

Basic options

Option 1: Use response futures

Option 2: AsyncService

Option 3: poll_ready

Advanced opt-in possibilities

danburkert commented Jul 30, 2017