Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed HTTP headers cause Envoy to 503 when proxying via HTTP/2 #7061

Closed
dcarney-stripe opened this issue May 23, 2019 · 16 comments · Fixed by #7306
Closed

Malformed HTTP headers cause Envoy to 503 when proxying via HTTP/2 #7061

dcarney-stripe opened this issue May 23, 2019 · 16 comments · Fixed by #7306
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Milestone

Comments

@dcarney-stripe
Copy link

dcarney-stripe commented May 23, 2019

Title:
When making client requests that contain invalid chars in an HTTP header to an Envoy instance ("envoy-one") that's configured to proxy over H/2 to an upstream Envoy ("envoy-two"), we see envoy-one return a 503 response with response flags "UR", without proxying the request to “envoy-two”. If we change the Envoy ←→ Envoy communication to use HTTP/1.1, we don’t see this behavior, and the request is happily proxied to the upstream Envoy.

Description:
While investigating some strange errors we were seeing in one of our applications, I wrote a quick test harness, so that I could make requests that contained invalid ASCII chars in HTTP headers.

The “invalid ASCII chars” are defined by the relevant RFC specs [0] [1]. In this specific test case, I inserted an ASCII control char (ASCII value 31, or 0x1F) into an HTTP header value and executed that request against our stack (Envoy → Envoy → application).

Relevant Links:
[0] https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2

I saw that this would generate a 503 from the first Envoy before the request ever reaches our application, since it sits upstream of the second Envoy.

Repro steps:
The repro config and necessary files are located in a github repo: https://github.com/dcarney/envoy-repro

Reproing this scenario involves setting up the following topography:
diagram

I have the HTTP request saved in the request.txt file. It includes the HTTP header value with an invalid ASCII control character, along with other random characters.

The included Docker Compose file sets up the following:

  • envoy-one
    • a listener on port :10001
    • a cluster configured to use HTTP/2
    • all traffic from the listener is sent to cluster “envoy-two” on port :10002
  • envoy-two
    • a listener on port :10002
    • a cluster configured to point to https://www.google.com (The actual upstream is irrelevant, as we’ll see)

Each envoy has its listener port and admin port mounted to the host, and the two Envoys are able to reach each other over the Docker network “envoytest”.

Build the two Envoy containers, and start everything up with:

$ docker build -t envoy:one -f Dockerfile.one .
$ docker build -t envoy:two -f Dockerfile.two .
$ docker-compose up

In another terminal, run the following, in order to make the HTTP request to the first Envoy (“envoy-one”):

$ cat request.txt | nc -n -i 127.0.0.1:10001

You should see the 503 response, which is emitted by the first Envoy (“envoy-one”):

HTTP/1.1 503 Service Unavailable
content-length: 85
content-type: text/plain
date: Wed, 22 May 2019 20:43:43 GMT
server: envoy

upstream connect error or disconnect/reset before headers. reset reason: remote reset%

Envoy access logs for the “client Envoy” (envoy-one) show that the “response flags” for this request are “UR”, which according to the Envoy docs means:

UR: Upstream remote reset in addition to 503 response code.

We can change the connection between “envoy-one” and “envoy-two” to use HTTP/1.1, by removing line 43-45 in envoy-one.yaml:

35   clusters:
...
40     dns_lookup_family: V4_ONLY
41     lb_policy: ROUND_ROBIN
42     tls_context:
43     http2_protocol_options: 
44       hpack_table_size:
45       max_concurrent_streams:
46     load_assignment:
47       cluster_name: service_foobar

If we re-build the containers, restart the Docker Compose topology, and execute the request again, we find that we now get a 404 response from https://www.google.com/foo, which is what we expect when the request is successfully proxied through both Envoys.

I've included packet captures from tcpdump from both Envoys in the repo (envoy-{one,two}.pcap). Looking at them in Wireshark clearly shows the upstream Envoy resetting the stream because of a "protocol error":

Screen Shot 2019-05-23 at 9 36 28 AM

Similarly, in the trace logs, we see this message from the "client Envoy":

envoy-one_1 [2019-05-23 16:31:36.076][11][debug][router] [source/common/router/router.cc:676] [C0][S8542315413287350126] upstream reset: reset reason remote reset

And this message from the upstream Envoy:

envoy-two_1 [2019-05-23 16:31:36.074][11][debug][http2] [source/common/http/http2/codec_impl.cc:539] [C0] invalid frame: Invalid HTTP header field was received on stream 1

I would expect that the client Envoy would return a 400 error, rather than constructing an H/2 request that fails to parse in the upstream Envoy.

Config:
envoy-one config
envoy-two config

Logs:
trace logs from both Envoys

I'm happy to provide any more details you might need. Thanks for looking!

@mattklein123 mattklein123 added the question Questions that are neither investigations, bugs, nor enhancements label May 29, 2019
@bobby-stripe
Copy link
Contributor

maybe to give a little more context here, it looks like:

  1. Nginx doesn't enforce HTTP header values are RFC compliant
  2. Envoy's H1 protocol implementation doesn't validate HTTP headers are RFC compliant
  3. client-side/downstream nghttp2 doesn't validate HTTP headers are RFC compliant (see discussion on Should libnghttp2 validate header field to send? nghttp2/nghttp2#641 )
  4. server-side/upstream nghttp2 calls nghttp2_check_header_value on each incoming H2 header value, this fails and ~ somehow ends up as a 503 (instead of, say, a 400)

It seems weird that Envoy can create packets that it-itself can't terminate, so a simple solution might be calling nghttp2_check_header_value when copying headers from H1 to H2 requests, and returning with, say, a 400 if that check fails.

Would something like that make sense?

@dcarney-stripe
Copy link
Author

@bobby-stripe Thanks for the additional context, and for digging up that nghttp2 issue (which didn't seem to get any attention since it was created in 2016).

As some "supporting evidence" for the type of fix that you proposed, we can look at how Go handles this: Go uses the httpguts.ValidHeaderFieldValue function (which seems analogous to nghttp2_check_header_value) to ensure that all the HTTP header values conform to the relevant RFC specs before an http.Transport instance will round-trip the request.

As a result, these malformed header values haven't caused the same sorts of issues in the Go parts of our stack.

@mattklein123 mattklein123 added enhancement Feature requests. Not bugs or questions. help wanted Needs help! and removed question Questions that are neither investigations, bugs, nor enhancements labels Jun 3, 2019
@mattklein123
Copy link
Member

Thanks for all the detailed context and report.

server-side/upstream nghttp2 calls nghttp2_check_header_value on each incoming H2 header value, this fails and ~ somehow ends up as a 503 (instead of, say, a 400)

The reason for this is that when nghttp2 detects the violation, it will reset the stream. Right now downstream Envoy will always end up translating that reset into a 503. It's possible that we could look at the reset type and translate to a 400 but I'm not sure it's worth it.

Overall, I agree that we should unify the H1 and H2 validation flows, so am moving this to help wanted.

@alyssawilk alyssawilk removed their assignment Jun 11, 2019
@skydoctor
Copy link

skydoctor commented Jun 12, 2019

To diagnose problems like these, is there any way to figure out exactly which header is malformed? The only log we get is "invalid frame: Invalid HTTP header field was received on stream"

@mattklein123
Copy link
Member

This failures comes from within nghttp2 and IIRC there is no easy way to get the failure details, other than turning on extra trace logging. It might be worth looking at the code though.

@dcarney
Copy link
Contributor

dcarney commented Jun 12, 2019

@skydoctor In this case, the problem isn't with a particular HTTP header, it's with any HTTP header whose value contains characters that are invalid, per the relevant RFC.

In my repro for this issue, you can see that I manually create a header with such contents, in request.txt.

As mentioned previously, Go's http.Transport validates the header values before round-tripping the request, which prevents a whole class of issues from arising further upstream. I think it would be reasonable for Envoy's H/2 codec to do the same, considering that that same codec (which uses nghttp2) validates these values on incoming HTTP requests.

I'm actually working on a PR that will use nghttp2's nghttp2_check_header_value function to validate the headers. I'm fumbling with the proper error handling, but I think the rest should be fairly straightforward.

@skydoctor
Copy link

Thanks @dcarney, it would be great to validate the header for sure. In general, if a request is received from a client we don't control, it becomes harder to find which specific header had illegal characters.

@mattklein123: There seems to be a way to get nghttp2 to print out the name of the header in the form of : Invalid HTTP header field was received: frame type: 1, stream: 13, name: [upgrade], value: [h2] instead of what we currently get: Invalid HTTP header field was received on stream 1. nghttp2/nghttp2#527. Not sure what would be the best way to use this in envoy.

@skydoctor
Copy link

@mattklein123, @alyssawilk: Do you see a way to incorporate the more detailed log from nghttp2?

@alyssawilk
Copy link
Contributor

Unfortunately I'm not sure there's a way to get nghttp2 to print that error, other than turning up all nghttp2 logs, which gets pretty spammy. It looks like the detailed string gets passed up to session_call_error_callback, but the envoy codec callback only gets NGHTTP2_ERR_HTTP_HEADER

Would it have helped if the upstream Envoy had extra debug info, i.e. where it logs
"invalid frame: Invalid HTTP header field was received on stream 1"
it also told you you could turn up trace logging for more details?

We could also ask @tatsuhiro-t (nghttp2 author) if he'd be up for either passing up the details string, or tracking stream state of the individual error in a way we could call back to the nghttp2 call stack to get the details.

@dcarney
Copy link
Contributor

dcarney commented Jun 17, 2019

@mattklein123 @alyssawilk I actually have a working PR that I was going to submit that fixes this issue following the rough approach that @bobby-stripe suggested a few weeks ago:

It seems weird that Envoy can create packets that it-itself can't terminate, so a simple solution might be calling nghttp2_check_header_value when copying headers from H1 to H2 requests, and returning with, say, a 400 if that check fails.

Would something like that make sense?

My PR makes the HTTP connection manager validate that the headers are RFC-compliant (delegating to nghttp2_check_header_value for the actual validation), returning an appropriate 400 if not. If this is considered a change to end-user behavior, I could see the argument for gating this new behavior behind a configuration property.

Let me know if this seems like a reasonable approach. I was planning to submit my PR imminently, after writing up a detailed PR description, checking the test coverage, etc.

@mattklein123
Copy link
Member

@dcarney thanks. I think this fix is reasonable, but I would be concerned about changing end-user behavior, which makes me think we should probably do opt-in for this, but I could be convinced otherwise. WDYT @alyssawilk?

@alyssawilk
Copy link
Contributor

Definitely opt-in please. I'm less sure if it needs to be a permanent config option or can we do it with runtime flags. We could always runtime-guard it with the plans of deprecating the old code path, but if anyone complains replace the runtime guard with a permanent config option?

I'm also curious if orthogonally we should tweak our upgrade detection. Right now on the H2 path we only remove the upgrade header iff both the upgrade header is present and there's a connection "upgrade" header, but arguably we could be more lenient in checking if there's legit non-standard-compliant upgrade requests.

@skydoctor
Copy link

skydoctor commented Jun 17, 2019

Thanks @alyssawilk. The place I see this error is an edge gateway and the incorrect header is received from an outside client. Hence the struggle is to convey to the client which exact header of theirs was at fault. We turned up the logging level to trace but still could only get the following. Are you implying that with logging level set to "trace", we should have seen the name of the header at fault?

[2019-06-17 13:35:07.096][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:735] [C137] setting stream-level initial window size to 268435456
[2019-06-17 13:35:07.096][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:757] [C137] updating connection-level initial window size to 268435456
[2019-06-17 13:35:07.098][32][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:243] [C137] new stream
[2019-06-17 13:35:07.099][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:549] [C137] invalid frame: Invalid HTTP header field was received on stream 1
[2019-06-17 13:35:07.099][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:531] [C137] sent reset code=1
[2019-06-17 13:35:07.099][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:577] [C137] stream closed: 1```

@tatsuhiro-t
Copy link

nghttp2 has https://nghttp2.org/documentation/nghttp2_session_callbacks_set_error_callback2.html#c.nghttp2_session_callbacks_set_error_callback2 which provides verbose error message.
Check for liberr == NGHTTP2_ERR_HTTP_HEADER you'll get the message. But after invoking the callback, libnghttp2 will reset stream so it is not usable to return HTTP status code.

@alyssawilk
Copy link
Contributor

Oh, that's fantastic, I'd missed that API, thanks.
I wonder if we could canonicalize this and send the information as optional proxy-status details? Alternately if nghttp2_check_header_value works for getting verbose details I'd be happy to see how that looks as well.
https://datatracker.ietf.org/doc/html/draft-nottingham-proxy-status
@PiotrSikora

@skydoctor
Copy link

Perhaps this could be used for printing the name of the header in the trace log printout as well?

@mattklein123 mattklein123 added this to the 1.12.0 milestone Jul 3, 2019
alyssawilk pushed a commit that referenced this issue Jul 15, 2019
Description:
This PR addresses the behavior described in #7061 by modifying the HTTP/1.1 codec, such that it verifies the value of each HTTP header when onHeaderValue is called. The codec raises an exception, which results in a 400 response being returned if an invalid header value (per RFC 7230, section 3.2) is found. The actual validation is done via the nghttp2_check_header_value function, which is wrapped in Envoy::Http::HeaderUtility::headerIsValid.

(NOTE: this has been discussed as something that nghttp2 could do itself, but the issue seems to have languished. Also note that Go handles this: Go uses the httpguts.ValidHeaderFieldValue function (which is analogous to nghttp2_check_header_value) to ensure that all the HTTP header values conform to the relevant RFC specs before an http.Transport instance will round-trip the request.

Risk Level: Low/medium
This stricter validation semantics are controlled with the envoy.reloadable_features.validate_header_values runtime-guarded feature. Originally, the PR used a new boolean on the HTTP connection manager proto to toggle the behavior, but during the course of PR review, it was decided that this would be more appropriate for a runtime guard.

Testing: new unit tests, manual tests
Release Notes: Updated
Fixes #7061

Signed-off-by: Dylan Carney <dcarney@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants