Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should TimeoutLayer middleware allow returning other status codes? #300

Open
mlodato517 opened this issue Oct 4, 2022 · 11 comments
Open
Labels
C-enhancement Category: A PR with an enhancement or a proposed on in an issue. I-needs-decision Issues in need of decision.

Comments

@mlodato517
Copy link

Feature Request

Motivation

I'd like to be able to return a 504 if the timeout given to the TimeoutLayer
is exceeded. This is because a 408, which is currently returned, is meant to
communicate
:

that the server did not receive a complete request message within the time
that it was prepared to wait.

I don't know if in general we can tell if the issue is the client sending the
request or the server processing the request but in the case of the latter
(especially because this service is proxying to another) I'd like to return
a 504 Gateway Timeout instead (which is still not a guarantee - the issue
could have been the upstream or any post-processing of the upstream's
response but that's rare in this case). Servers like axum and actix-web
provide other mechanisms for timeouts when receiving things like request
headers12 so maybe something about receiving the request should be handled
at the server level or by something like BodyTimeout?

Proposal

Changing the response status from 408 all the time to 504 all the time
would equally be incorrect sometimes and would be a breaking change.
Luckily we already have a new() constructor for TimeoutLayer so I
imagine that could stay the same and default to 408 and then we could
have a new_with_response constructor that accepts an http::Response<B>
or similar that can be returned instead.

Alternatives

The most obvious alternative I can see follows from this comment
which is to use tower instead of tower-http and explicitly catch the
error and perform custom logic on it. This isn't too much code but I
wonder if the alternate constructor is valuable because I'm not sure
how frequent 408 will be the desired response code (though HTTP
semantics are fuzzy enough that maybe it's not worth it!).

Footnotes

  1. https://docs.rs/axum-server/latest/axum_server/struct.HttpConfig.html#method.http1_header_read_timeout

  2. https://docs.rs/actix-web/latest/actix_web/struct.HttpServer.html#method.client_request_timeout

@davidpdrsn
Copy link
Member

Servers like axum and actix-web
provide other mechanisms for timeouts when receiving things like request
headers

From a tower middleware it's not possible to distinguish between reading the request head timing out or the server handling the request. That requires a specific integration into hyper.

I think if you want a different status you can write your own middleware for it. It's not complicated. If you're using axum it's even easier with axum::middleware::from_fn.

@mlodato517
Copy link
Author

From a tower middleware it's not possible to distinguish between reading the request head timing out or the server handling the request. That requires a specific integration into hyper.

Totally agreed. Which makes me feel even more like this middleware shouldn't be assuming the error is a client-side error 🤔

@davidpdrsn
Copy link
Member

But not all servers are proxies so 504 doesn't seem right to me.

The HyperText Transfer Protocol (HTTP) 504 Gateway Timeout server error response code indicates that the server, while acting as a gateway or proxy, did not get a response in time from the upstream server that it needed in order to complete the request.

@mlodato517
Copy link
Author

But not all servers are proxies so 504 doesn't seem right to me.

Ah totally agreed there - that's why I was hoping to make it configurable. Basically "this middleware is generic and can be used anywhere and so can't pick a status code and be correct all the time". So we can leave the default as 408 so it's not a breaking change but if we give consumers another constructor/method to provide a custom Response1 then it can be used anywhere2.

Footnotes

  1. Or maybe just a status code for now but a Response object would be more flexible at the end of the day

  2. I'm sure there are places it can't be used but this at least expands where it can be used! 🤞

@seanmonstar
Copy link
Collaborator

Often I've seen servers return 503 when their own internal timeout triggered (not as a gateway). :)

@davidpdrsn
Copy link
Member

Sure we can add a method to change the status code.

I don't think we should allow customizing the response body. If people need that I'd recommend writing your own middleware.

@mlodato517
Copy link
Author

Yeah I can see adding the Response would be annoying because of the generic body. It's a micro-bummer because, in the case of a 503, for instance, the server may want to add a Retry-After header or something but I think a status code is a reasonable compromise for now!

@seanmonstar
Copy link
Collaborator

A lot of inspiration for tower came from Finagle, so if there's some prior art for how they handle HTTP timeouts, we could, uh, be inspired again :)

@davidpdrsn davidpdrsn added C-enhancement Category: A PR with an enhancement or a proposed on in an issue. I-needs-decision Issues in need of decision. labels Dec 2, 2022
@Kinrany
Copy link

Kinrany commented Jun 28, 2023

To provide one example, Firefox appears to rely on 408 to mean "client failed to upload the request in time" to retry requests automatically. This makes sense according to the spec as far as I can tell.

This makes returning 408 the entirely wrong thing to do for timeouts caused by server-side processing of the request taking too long: the same slow operation will likely be executed and aborted every time in exactly the same way.

@mohe2015
Copy link

Firefox will retry the request 10 times in my testing when sending 408 so I think we should either change the default or at least make it configurable. I personally would really change the default as that looks like a big footgun.

@BirdLogics
Copy link

Firefox not handling 408 errors properly is actually a pretty big issue. It makes TimeoutLayer pretty much unusable.

The issue is also not mentioned in the docs, which means developers using Firefox (like me 😆) will be left scratching their heads and wondering why the connection is being reset instead of getting a timeout status code.

Example code (GitHub repo):

use std::time::Duration;

use axum::{http::StatusCode, response::IntoResponse, routing::get, BoxError, Router};
use tokio::{net::TcpListener, time::sleep};
use tower::ServiceBuilder;
use tower_http::{timeout::TimeoutLayer, trace::TraceLayer};
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};

#[tokio::main]
async fn main() {
    tracing_subscriber::registry()
        .with(tracing_subscriber::filter::LevelFilter::DEBUG)
        .with(tracing_subscriber::fmt::layer())
        .init();

    // The timeout can be handled with a 504 like this
    // let services = tower::ServiceBuilder::new()
    //     .layer(TraceLayer::new_for_http())
    //     .layer(axum::error_handling::HandleErrorLayer::new(handle_error))
    //     .timeout(Duration::from_secs(3));

    let services = ServiceBuilder::new()
        .layer(TraceLayer::new_for_http())
        .layer(TimeoutLayer::new(Duration::from_secs(3)));

    let app = Router::new().route("/", get(index)).layer(services);

    let address = "127.0.0.1:3000".to_string();
    println!("Hosting on http://{address}");

    let listener = TcpListener::bind(address).await.unwrap();

    axum::serve(listener, app).await.unwrap();
}

async fn index() -> impl IntoResponse {
    sleep(Duration::from_secs(5)).await;

    "Hello world!".to_string()
}

async fn handle_error(err: BoxError) -> impl IntoResponse {
    if err.is::<tower::timeout::error::Elapsed>() {
        return (StatusCode::GATEWAY_TIMEOUT, "Request Timeout".to_string());
    }
    (
        StatusCode::INTERNAL_SERVER_ERROR,
        "Internal Server Error".to_string(),
    )
}

Output (with TimeoutLayer):

Hosting on http://127.0.0.1:3000
2024-10-05T01:05:03.371466Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:06.372723Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3001 ms status=408
2024-10-05T01:05:06.375414Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:09.376837Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3001 ms status=408
2024-10-05T01:05:09.379284Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:12.380401Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3001 ms status=408
2024-10-05T01:05:12.382994Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:15.384978Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3002 ms status=408
2024-10-05T01:05:15.387308Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:18.388012Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3000 ms status=408
2024-10-05T01:05:18.390559Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:21.392279Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3001 ms status=408
2024-10-05T01:05:21.394712Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:24.396856Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3002 ms status=408
2024-10-05T01:05:24.399069Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:27.400591Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3001 ms status=408
2024-10-05T01:05:27.402798Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:30.404358Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3001 ms status=408
2024-10-05T01:05:30.406813Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2024-10-05T01:05:33.408362Z DEBUG request{method=GET uri=/ version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=3001 ms status=408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: A PR with an enhancement or a proposed on in an issue. I-needs-decision Issues in need of decision.
Projects
None yet
Development

No branches or pull requests

6 participants