Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection reset while connecting https://storage.googleapis.com with rustls #1809

Closed
Xuanwo opened this issue Apr 26, 2023 · 22 comments
Closed

Comments

@Xuanwo
Copy link
Contributor

Xuanwo commented Apr 26, 2023

Hi, I came across a strange issue where sending a request via rustls with the host header to https://storage.googleapis.com results in the connection being reset.

Related to apache/opendal#2125 (comment)

Reproduce

Cargo.toml

[package]
name = "demo"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[features]
default = ["reqwest/rustls-tls-native-roots"]
native-tls = ["reqwest/native-tls"]

[dependencies]
reqwest = { version = "0.11.16", default-features = false }
tokio = { version = "1.28.0", features = ["full"] }

main.rs

use reqwest::Client;

#[tokio::main]
async fn main() {
    let client = Client::new();

    let url = "https://storage.googleapis.com/";
    let content = client
        .get(url)
        .body("")
        .header("host", "storage.googleapis.com")
        .send()
        .await
        .expect("Failed to send request");

    println!("{}", content.text().await.unwrap())
}

With rustls:

:) cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/demo`
thread 'main' panicked at 'Failed to send request: reqwest::Error { kind: Request, url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("storage.googleapis.com")), port: None, path: "/", query: None, fragment: None }, source: hyper::Error(Http2, Error { kind: Reset(StreamId(1), PROTOCOL_ERROR, Remote) }) }', src/main.rs:14:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

With native-tls:

:( cargo run --features=native-tls
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/demo`
<?xml version='1.0' encoding='UTF-8'?><Error><Code>MissingSecurityHeader</Code><Message>Your request was missing a required header.</Message><Details>Authorization</Details></Error>

If we remove the host header, it works too:

:) cargo run
   Compiling demo v0.1.0 (/tmp/demo)
    Finished dev [unoptimized + debuginfo] target(s) in 0.30s
     Running `target/debug/demo`
<?xml version='1.0' encoding='UTF-8'?><Error><Code>MissingSecurityHeader</Code><Message>Your request was missing a required header.</Message><Details>Authorization</Details></Error>

Question

How do rustls and native-tls interact with the host header? Is this behavior expected?

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Apr 26, 2023

Oh, I got it. http/2 doesn't have Host header. Maye we should strip it from headers since it's illegal in http/2?

@seanmonstar
Copy link
Owner

The HTTP/2 spec (RFC 9113) doesn't make the Host header illegal (like it does a few others, such as connection). But it does say that :authority is the correct one, and that a client should not make a request where the :authority and host differ.

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Apr 26, 2023

Thanks for the quick response!

a client should not make a request where the :authority and host differ.

What's the expect behavior if client specify the same host as in :authority? Like this issue:

let url = "https://storage.googleapis.com/";
let content = client
    .get(url)
    .body("")
    .header("host", "storage.googleapis.com")
    .send()
    .await
    .expect("Failed to send request");

@seanmonstar
Copy link
Owner

It shouldn't be an error, at least the spec doesn't say it should.

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Apr 26, 2023

It shouldn't be an error, at least the spec doesn't say it should.

Agreed.


So we need to investigate further why rustls and native-tls are causing a difference here. Do you have any ideas? I am willing to assist in resolving this issue.

@seanmonstar
Copy link
Owner

seanmonstar commented Apr 26, 2023

Well, between the two, only rustls supports ALPN, which means it will use HTTP/2 automatically (if the server supports it). The native-tls backend doesn't have ALPN support, the request is using HTTP/1 (unless you use http2_prior_knowledge()).

@Zheaoli
Copy link

Zheaoli commented Apr 26, 2023

It shouldn't be an error, at least the spec doesn't say it should.

No, it shouble be a error

The recipient of an HTTP/2 request MUST NOT use the Host header field to determine the target URI if ":authority" is present.

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Apr 26, 2023

Is it a good idea to remove Host header here? (Alought I believe it's not the root cause)

https://github.com/hyperium/hyper/blob/afe278abe077cf85a29b6631b838cd335f50d30d/src/proto/h2/client.rs#L361-L367

let (head, body) = req.into_parts();
let mut req = ::http::Request::from_parts(head, ());
super::strip_connection_headers(req.headers_mut(), true);
if let Some(len) = body.size_hint().exact() {
    if len != 0 || headers::method_has_defined_payload_semantics(req.method()) {
        headers::set_content_length_if_missing(req.headers_mut(), len);
    }
}

Seems curl will also trim host header:

* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=storage.googleapis.com
*  start date: Apr  3 08:27:38 2023 GMT
*  expire date: Jun 26 08:27:37 2023 GMT
*  subjectAltName: host "storage.googleapis.com" matched cert's "storage.googleapis.com"
*  issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1C3
*  SSL certificate verify ok.
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: storage.googleapis.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* Using Stream ID: 1 (easy handle 0x55adae534ea0)
> GET / HTTP/2
> Host: storage.googleapis.com
> user-agent: curl/8.0.1
> accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 400 
< x-guploader-uploadid: ADPycduo3u9MXg5lYHW7eUp6VLsPw1Fb0pLKmlnIJL61MMtngobtRf2xFfOYRfrDsUiJU_rQhIuRCvIdglI7QGU1yhxQ1w
< content-type: application/xml; charset=UTF-8
< content-length: 181
< date: Wed, 26 Apr 2023 15:14:24 GMT
< expires: Wed, 26 Apr 2023 15:14:24 GMT
< cache-control: private, max-age=0
< server: UploadServer
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
< 
* Connection #0 to host 192.168.1.104 left intact
<?xml version='1.0' encoding='UTF-8'?><Error><Code>MissingSecurityHeader</Code><Message>Your request was missing a required header.</Message><Details>Authorization</Details></Error>%

Curl trace:

 146   │ == Info: using HTTP/2
 147   │ == Info: h2h3 [:method: GET]
 148   │ == Info: h2h3 [:path: /]
 149   │ == Info: h2h3 [:scheme: https]
 150   │ == Info: h2h3 [:authority: storage.googleapis.com]
 151   │ == Info: h2h3 [user-agent: curl/8.0.1]
 152   │ == Info: h2h3 [accept: */*]
 153   │ == Info: Using Stream ID: 1 (easy handle 0x560ca1aebea0)
 154   │ => Send SSL data, 5 bytes (0x5)
 155   │ 0000: ....=
 156   │ => Send SSL data, 1 bytes (0x1)
 157   │ 0000: .
 158   │ => Send header, 83 bytes (0x53)
 159   │ 0000: GET / HTTP/2
 160   │ 000e: Host: storage.googleapis.com
 161   │ 002c: user-agent: curl/8.0.1
 162   │ 0044: accept: */*
 163   │ 0051: 
 164   │ <= Recv SSL data, 5 bytes (0x5)

@seanmonstar
Copy link
Owner

seanmonstar commented Apr 26, 2023

The recipient of an HTTP/2 request MUST NOT use the Host

That's saying that the server (recipient) must not use host for routing if :authority is present. It's not saying the server needs return an error.

Is it a good idea to remove Host header here?

I don't believe so. The spec points out that intermediaries may need to forward the header on.

Seems curl will also trim host header:

Does curl actually snip it out if you specify an additional one? Such as -H "host: yolo.com"?

If you don't explicitly set the header field with reqwest, it will only automatically include one in HTTP/1. It won't set one in an HTTP/2 request.

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Apr 26, 2023

Does curl actually snip it out if you specify an additional one? Such as -H "host: yolo.com"?

The curl trace shows that the host header is not sent in HEADERS stream (please correct me if I understand in wrong):

> curl --trace-ascii curl.trace -H "host: storage.googleapis.com" "https://storage.googleapis.com"
<?xml version='1.0' encoding='UTF-8'?><Error><Code>MissingSecurityHeader</Code><Message>Your request was missing a required header.</Message><Details>Authorization</Details></Error>

The curl.trace will show:

 142   │ => Send SSL data, 5 bytes (0x5)
 143   │ 0000: ....Q
 144   │ => Send SSL data, 1 bytes (0x1)
 145   │ 0000: .
 146   │ == Info: using HTTP/2
 147   │ == Info: h2h3 [:method: GET]
 148   │ == Info: h2h3 [:path: /]
 149   │ == Info: h2h3 [:scheme: https]
 150   │ == Info: h2h3 [:authority: storage.googleapis.com]
 151   │ == Info: h2h3 [user-agent: curl/8.0.1]
 152   │ == Info: h2h3 [accept: */*]
 153   │ == Info: Using Stream ID: 1 (easy handle 0x560ca1aebea0)
 154   │ => Send SSL data, 5 bytes (0x5)
 155   │ 0000: ....=
 156   │ => Send SSL data, 1 bytes (0x1)
 157   │ 0000: .
 158   │ => Send header, 83 bytes (0x53)
 159   │ 0000: GET / HTTP/2
 160   │ 000e: Host: storage.googleapis.com
 161   │ 002c: user-agent: curl/8.0.1
 162   │ 0044: accept: */*
 163   │ 0051: 
 164   │ <= Recv SSL data, 5 bytes (0x5)

@seanmonstar
Copy link
Owner

seanmonstar commented Apr 26, 2023

From RFC 9113:

An intermediary that forwards a request over HTTP/2 MAY retain any Host header field.

I don't think hyper should forcibly strip that header, if the user purposefully put it there, since the spec indicates some intermediaries may need to pass it along.

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Apr 26, 2023

I don't think hyper should forcibly strip that header, if the user purposefully put it there, since the spec indicates some intermediaries may need to pass it along.

I am not suggesting that hyper should remove the host header. My expectation is that a valid request should be sent without any errors.

How can we ensure that the request is sent successfully? Or do you think that the issue lies with the server and storage.googleapis.com is behaving incorrectly?

@seanmonstar
Copy link
Owner

It might be a server issue? Then again, I suppose any server is allowed to reject any request for whatever reason they want.

Do you need to explicitly set the host header? reqwest will make sure it's correctly set automatically depending on version.

@Zheaoli
Copy link

Zheaoli commented Apr 26, 2023

I read about the curl code

First, the curl defines some field in the header which should not be in the h2 request, FYI https://github.com/curl/curl/blob/master/lib/http.c#L4758-L4765

Then, the curl removes the field defined above https://github.com/curl/curl/blob/master/lib/http.c#L4833-L4839

FYI https://github.com/curl/curl/blob/master/lib/http.h#L294-L306

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Apr 26, 2023

It might be a server issue? Then again, I suppose any server is allowed to reject any request for whatever reason they want.

Agreed. The error message does indicate that the connection is reset by remote.

source: hyper::Error(Http2, Error { kind: Reset(StreamId(1), PROTOCOL_ERROR, Remote) })

Do you need to explicitly set the host header?

No, actually I can remove the related host setting logic. It's just confusing that the same request doesn't work on HTTP/2.

I'm okay with it if it's the expected behavior.

@seanmonstar
Copy link
Owner

Oh, they're sending a RST_STREAM of PROTOCOL_ERROR? I think that interpretation is incorrect. It's not a protocol error, the protocol says it MAY happen.

I assumed it was the server replying with a 400, which is fine, a server can say anything is bad in that sense.

@Zheaoli
Copy link

Zheaoli commented Apr 26, 2023

Here's something interesting @seanmonstar @Xuanwo

CleanShot 2023-04-27 at 00 34 35@2x

I can't reproduce this error on MacOS but I reproduce it on Linux.(I'm not sure if is there any platform-spec behavior in this lib?

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Apr 26, 2023

I can't reproduce this error on MacOS but I reproduce it on Linux.(I'm not sure if is there any platform-spec behavior in this lib?

Can you set up env-logger and running RUST_LOG=trace cargo run for more information?

@Zheaoli
Copy link

Zheaoli commented Apr 26, 2023

I know what happens!

The package-capture tools change the behavior. Damn..

@everpcpc
Copy link

everpcpc commented Apr 27, 2023

From RFC 9113:

An intermediary that forwards a request over HTTP/2 MAY retain any Host header field.

I don't think hyper should forcibly strip that header, if the user purposefully put it there, since the spec indicates some intermediaries may need to pass it along.

but also in the above section:

An intermediary that needs to generate a Host header field (which might be necessary to construct an HTTP/1.1 request) MUST use the value from the ":authority" pseudo-header field as the value of the Host field, unless the intermediary also changes the request target. This replaces any existing Host field to avoid potential vulnerabilities in HTTP routing.

Then retaining or tripping the Host header would make nothing difference?

@micolous
Copy link

micolous commented Oct 24, 2024

I can also see the same behaviour when trying to fetch https://www.youtube.com. That suggests that the issue may be common to using any Google web service, not just Google Cloud Storage. I haven't tested this against a Google Cloud L7 load balancer.

It looks like nginx has a similar issue when serving HTTP/2, but it returns HTTP 400 Bad Request. nginx.org appears to be configured to serve over HTTP/1.1 only (which isn't affected), but I could reproduce this behaviour on a local nginx instance which was configured to serve HTTP/2.

In applications, this issue can be surprising, because it'll only break once a server accepts HTTP/2. This could lead to a similar situation to the January 2022 Firefox outage triggered1 by Google enabling HTTP/3 serving by default on GCP load balancers. So I can see the argument for the library taking care of this.

However, sometimes you need to sign requests including the Host and/or :authority header(s), and your application needs to know if the Host header is used (HTTP/1.1), the :authority header is used (HTTP/2), or both (HTTP/2) – regardless of server-side limitations. A HTTP library silently altering those requests would invalidate those signatures.

In any case, the simple answer for applications not signing requests is "don't explicitly set the Host header". 😄

Footnotes

  1. The word "triggered" has a different meaning to the word "caused". Mozilla's public write-up goes into lots of detail about all the contributing factors.

@seanmonstar
Copy link
Owner

I think at this point the issue can just serve as background for why a server might be unhappy.

@seanmonstar seanmonstar closed this as not planned Won't fix, can't repro, duplicate, stale Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants