Malformed HTTP headers cause Envoy to 503 when proxying via HTTP/2 #7061

dcarney-stripe · 2019-05-23T17:19:22Z

Title:
When making client requests that contain invalid chars in an HTTP header to an Envoy instance ("envoy-one") that's configured to proxy over H/2 to an upstream Envoy ("envoy-two"), we see envoy-one return a 503 response with response flags "UR", without proxying the request to “envoy-two”. If we change the Envoy ←→ Envoy communication to use HTTP/1.1, we don’t see this behavior, and the request is happily proxied to the upstream Envoy.

Description:
While investigating some strange errors we were seeing in one of our applications, I wrote a quick test harness, so that I could make requests that contained invalid ASCII chars in HTTP headers.

The “invalid ASCII chars” are defined by the relevant RFC specs [0] [1]. In this specific test case, I inserted an ASCII control char (ASCII value 31, or 0x1F) into an HTTP header value and executed that request against our stack (Envoy → Envoy → application).

Relevant Links:
[0] https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2

I saw that this would generate a 503 from the first Envoy before the request ever reaches our application, since it sits upstream of the second Envoy.

Repro steps:
The repro config and necessary files are located in a github repo: https://github.com/dcarney/envoy-repro

Reproing this scenario involves setting up the following topography:

I have the HTTP request saved in the request.txt file. It includes the HTTP header value with an invalid ASCII control character, along with other random characters.

The included Docker Compose file sets up the following:

envoy-one
- a listener on port :10001
- a cluster configured to use HTTP/2
- all traffic from the listener is sent to cluster “envoy-two” on port :10002
envoy-two
- a listener on port :10002
- a cluster configured to point to https://www.google.com (The actual upstream is irrelevant, as we’ll see)

Each envoy has its listener port and admin port mounted to the host, and the two Envoys are able to reach each other over the Docker network “envoytest”.

Build the two Envoy containers, and start everything up with:

$ docker build -t envoy:one -f Dockerfile.one .
$ docker build -t envoy:two -f Dockerfile.two .
$ docker-compose up

In another terminal, run the following, in order to make the HTTP request to the first Envoy (“envoy-one”):

$ cat request.txt | nc -n -i 127.0.0.1:10001

You should see the 503 response, which is emitted by the first Envoy (“envoy-one”):

HTTP/1.1 503 Service Unavailable
content-length: 85
content-type: text/plain
date: Wed, 22 May 2019 20:43:43 GMT
server: envoy

upstream connect error or disconnect/reset before headers. reset reason: remote reset%

Envoy access logs for the “client Envoy” (envoy-one) show that the “response flags” for this request are “UR”, which according to the Envoy docs means:

UR: Upstream remote reset in addition to 503 response code.

We can change the connection between “envoy-one” and “envoy-two” to use HTTP/1.1, by removing line 43-45 in envoy-one.yaml:

35   clusters:
...
40     dns_lookup_family: V4_ONLY
41     lb_policy: ROUND_ROBIN
42     tls_context:
43     http2_protocol_options: 
44       hpack_table_size:
45       max_concurrent_streams:
46     load_assignment:
47       cluster_name: service_foobar

If we re-build the containers, restart the Docker Compose topology, and execute the request again, we find that we now get a 404 response from https://www.google.com/foo, which is what we expect when the request is successfully proxied through both Envoys.

I've included packet captures from tcpdump from both Envoys in the repo (envoy-{one,two}.pcap). Looking at them in Wireshark clearly shows the upstream Envoy resetting the stream because of a "protocol error":

Similarly, in the trace logs, we see this message from the "client Envoy":

envoy-one_1 [2019-05-23 16:31:36.076][11][debug][router] [source/common/router/router.cc:676] [C0][S8542315413287350126] upstream reset: reset reason remote reset

And this message from the upstream Envoy:

envoy-two_1 [2019-05-23 16:31:36.074][11][debug][http2] [source/common/http/http2/codec_impl.cc:539] [C0] invalid frame: Invalid HTTP header field was received on stream 1

I would expect that the client Envoy would return a 400 error, rather than constructing an H/2 request that fails to parse in the upstream Envoy.

Config:
envoy-one config
envoy-two config

Logs:
trace logs from both Envoys

I'm happy to provide any more details you might need. Thanks for looking!

The text was updated successfully, but these errors were encountered:

bobby-stripe · 2019-05-31T16:04:53Z

maybe to give a little more context here, it looks like:

Nginx doesn't enforce HTTP header values are RFC compliant
Envoy's H1 protocol implementation doesn't validate HTTP headers are RFC compliant
client-side/downstream nghttp2 doesn't validate HTTP headers are RFC compliant (see discussion on Should libnghttp2 validate header field to send? nghttp2/nghttp2#641 )
server-side/upstream nghttp2 calls nghttp2_check_header_value on each incoming H2 header value, this fails and ~ somehow ends up as a 503 (instead of, say, a 400)

It seems weird that Envoy can create packets that it-itself can't terminate, so a simple solution might be calling nghttp2_check_header_value when copying headers from H1 to H2 requests, and returning with, say, a 400 if that check fails.

Would something like that make sense?

dcarney-stripe · 2019-05-31T16:56:40Z

@bobby-stripe Thanks for the additional context, and for digging up that nghttp2 issue (which didn't seem to get any attention since it was created in 2016).

As some "supporting evidence" for the type of fix that you proposed, we can look at how Go handles this: Go uses the httpguts.ValidHeaderFieldValue function (which seems analogous to nghttp2_check_header_value) to ensure that all the HTTP header values conform to the relevant RFC specs before an http.Transport instance will round-trip the request.

As a result, these malformed header values haven't caused the same sorts of issues in the Go parts of our stack.

mattklein123 · 2019-06-03T22:25:18Z

Thanks for all the detailed context and report.

server-side/upstream nghttp2 calls nghttp2_check_header_value on each incoming H2 header value, this fails and ~ somehow ends up as a 503 (instead of, say, a 400)

The reason for this is that when nghttp2 detects the violation, it will reset the stream. Right now downstream Envoy will always end up translating that reset into a 503. It's possible that we could look at the reset type and translate to a 400 but I'm not sure it's worth it.

Overall, I agree that we should unify the H1 and H2 validation flows, so am moving this to help wanted.

skydoctor · 2019-06-12T15:39:13Z

To diagnose problems like these, is there any way to figure out exactly which header is malformed? The only log we get is "invalid frame: Invalid HTTP header field was received on stream"

mattklein123 · 2019-06-12T16:31:18Z

This failures comes from within nghttp2 and IIRC there is no easy way to get the failure details, other than turning on extra trace logging. It might be worth looking at the code though.

dcarney · 2019-06-12T16:32:41Z

@skydoctor In this case, the problem isn't with a particular HTTP header, it's with any HTTP header whose value contains characters that are invalid, per the relevant RFC.

In my repro for this issue, you can see that I manually create a header with such contents, in request.txt.

As mentioned previously, Go's http.Transport validates the header values before round-tripping the request, which prevents a whole class of issues from arising further upstream. I think it would be reasonable for Envoy's H/2 codec to do the same, considering that that same codec (which uses nghttp2) validates these values on incoming HTTP requests.

I'm actually working on a PR that will use nghttp2's nghttp2_check_header_value function to validate the headers. I'm fumbling with the proper error handling, but I think the rest should be fairly straightforward.

skydoctor · 2019-06-12T18:19:34Z

Thanks @dcarney, it would be great to validate the header for sure. In general, if a request is received from a client we don't control, it becomes harder to find which specific header had illegal characters.

@mattklein123: There seems to be a way to get nghttp2 to print out the name of the header in the form of : Invalid HTTP header field was received: frame type: 1, stream: 13, name: [upgrade], value: [h2] instead of what we currently get: Invalid HTTP header field was received on stream 1. nghttp2/nghttp2#527. Not sure what would be the best way to use this in envoy.

skydoctor · 2019-06-17T14:39:18Z

@mattklein123, @alyssawilk: Do you see a way to incorporate the more detailed log from nghttp2?

alyssawilk · 2019-06-17T14:56:31Z

Unfortunately I'm not sure there's a way to get nghttp2 to print that error, other than turning up all nghttp2 logs, which gets pretty spammy. It looks like the detailed string gets passed up to session_call_error_callback, but the envoy codec callback only gets NGHTTP2_ERR_HTTP_HEADER

Would it have helped if the upstream Envoy had extra debug info, i.e. where it logs
"invalid frame: Invalid HTTP header field was received on stream 1"
it also told you you could turn up trace logging for more details?

We could also ask @tatsuhiro-t (nghttp2 author) if he'd be up for either passing up the details string, or tracking stream state of the individual error in a way we could call back to the nghttp2 call stack to get the details.

dcarney · 2019-06-17T15:25:11Z

@mattklein123 @alyssawilk I actually have a working PR that I was going to submit that fixes this issue following the rough approach that @bobby-stripe suggested a few weeks ago:

It seems weird that Envoy can create packets that it-itself can't terminate, so a simple solution might be calling nghttp2_check_header_value when copying headers from H1 to H2 requests, and returning with, say, a 400 if that check fails.

Would something like that make sense?

My PR makes the HTTP connection manager validate that the headers are RFC-compliant (delegating to nghttp2_check_header_value for the actual validation), returning an appropriate 400 if not. If this is considered a change to end-user behavior, I could see the argument for gating this new behavior behind a configuration property.

Let me know if this seems like a reasonable approach. I was planning to submit my PR imminently, after writing up a detailed PR description, checking the test coverage, etc.

mattklein123 · 2019-06-17T16:42:14Z

@dcarney thanks. I think this fix is reasonable, but I would be concerned about changing end-user behavior, which makes me think we should probably do opt-in for this, but I could be convinced otherwise. WDYT @alyssawilk?

alyssawilk · 2019-06-17T18:25:16Z

Definitely opt-in please. I'm less sure if it needs to be a permanent config option or can we do it with runtime flags. We could always runtime-guard it with the plans of deprecating the old code path, but if anyone complains replace the runtime guard with a permanent config option?

I'm also curious if orthogonally we should tweak our upgrade detection. Right now on the H2 path we only remove the upgrade header iff both the upgrade header is present and there's a connection "upgrade" header, but arguably we could be more lenient in checking if there's legit non-standard-compliant upgrade requests.

skydoctor · 2019-06-17T18:33:33Z

Thanks @alyssawilk. The place I see this error is an edge gateway and the incorrect header is received from an outside client. Hence the struggle is to convey to the client which exact header of theirs was at fault. We turned up the logging level to trace but still could only get the following. Are you implying that with logging level set to "trace", we should have seen the name of the header at fault?

[2019-06-17 13:35:07.096][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:735] [C137] setting stream-level initial window size to 268435456
[2019-06-17 13:35:07.096][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:757] [C137] updating connection-level initial window size to 268435456
[2019-06-17 13:35:07.098][32][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:243] [C137] new stream
[2019-06-17 13:35:07.099][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:549] [C137] invalid frame: Invalid HTTP header field was received on stream 1
[2019-06-17 13:35:07.099][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:531] [C137] sent reset code=1
[2019-06-17 13:35:07.099][32][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:577] [C137] stream closed: 1```

tatsuhiro-t · 2019-06-18T02:57:03Z

nghttp2 has https://nghttp2.org/documentation/nghttp2_session_callbacks_set_error_callback2.html#c.nghttp2_session_callbacks_set_error_callback2 which provides verbose error message.
Check for liberr == NGHTTP2_ERR_HTTP_HEADER you'll get the message. But after invoking the callback, libnghttp2 will reset stream so it is not usable to return HTTP status code.

alyssawilk · 2019-06-18T17:36:56Z

Oh, that's fantastic, I'd missed that API, thanks.
I wonder if we could canonicalize this and send the information as optional proxy-status details? Alternately if nghttp2_check_header_value works for getting verbose details I'd be happy to see how that looks as well.
https://datatracker.ietf.org/doc/html/draft-nottingham-proxy-status
@PiotrSikora

skydoctor · 2019-06-25T15:18:53Z

Perhaps this could be used for printing the name of the header in the trace log printout as well?

Description: This PR addresses the behavior described in #7061 by modifying the HTTP/1.1 codec, such that it verifies the value of each HTTP header when onHeaderValue is called. The codec raises an exception, which results in a 400 response being returned if an invalid header value (per RFC 7230, section 3.2) is found. The actual validation is done via the nghttp2_check_header_value function, which is wrapped in Envoy::Http::HeaderUtility::headerIsValid. (NOTE: this has been discussed as something that nghttp2 could do itself, but the issue seems to have languished. Also note that Go handles this: Go uses the httpguts.ValidHeaderFieldValue function (which is analogous to nghttp2_check_header_value) to ensure that all the HTTP header values conform to the relevant RFC specs before an http.Transport instance will round-trip the request. Risk Level: Low/medium This stricter validation semantics are controlled with the envoy.reloadable_features.validate_header_values runtime-guarded feature. Originally, the PR used a new boolean on the HTTP connection manager proto to toggle the behavior, but during the course of PR review, it was decided that this would be more appropriate for a runtime guard. Testing: new unit tests, manual tests Release Notes: Updated Fixes #7061 Signed-off-by: Dylan Carney <dcarney@gmail.com>

jmarantz assigned alyssawilk May 28, 2019

mattklein123 added the question Questions that are neither investigations, bugs, nor enhancements label May 29, 2019

mattklein123 added enhancement Feature requests. Not bugs or questions. help wanted Needs help! and removed question Questions that are neither investigations, bugs, nor enhancements labels Jun 3, 2019

alyssawilk removed their assignment Jun 11, 2019

dcarney mentioned this issue Jun 17, 2019

http/1.1 codec: Rejects requests w/ invalid HTTP header values #7306

Merged

mattklein123 added this to the 1.12.0 milestone Jul 3, 2019

alyssawilk closed this as completed in #7306 Jul 15, 2019

skydoctor mentioned this issue Jul 15, 2019

Print detailed error log from nghttp2 by using the provided callback #7589

Closed

yangminzhu mentioned this issue Mar 5, 2020

ext_authz gRPC service doesn't work with the cluster name generated by pilot istio/istio#21841

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Malformed HTTP headers cause Envoy to 503 when proxying via HTTP/2 #7061

Malformed HTTP headers cause Envoy to 503 when proxying via HTTP/2 #7061

dcarney-stripe commented May 23, 2019 •

edited

Loading

bobby-stripe commented May 31, 2019

dcarney-stripe commented May 31, 2019

mattklein123 commented Jun 3, 2019

skydoctor commented Jun 12, 2019 •

edited

Loading

mattklein123 commented Jun 12, 2019

dcarney commented Jun 12, 2019 •

edited

Loading

skydoctor commented Jun 12, 2019

skydoctor commented Jun 17, 2019

alyssawilk commented Jun 17, 2019

dcarney commented Jun 17, 2019 •

edited

Loading

mattklein123 commented Jun 17, 2019

alyssawilk commented Jun 17, 2019

skydoctor commented Jun 17, 2019 •

edited

Loading

tatsuhiro-t commented Jun 18, 2019

alyssawilk commented Jun 18, 2019

skydoctor commented Jun 25, 2019

Malformed HTTP headers cause Envoy to 503 when proxying via HTTP/2 #7061

Malformed HTTP headers cause Envoy to 503 when proxying via HTTP/2 #7061

Comments

dcarney-stripe commented May 23, 2019 • edited Loading

bobby-stripe commented May 31, 2019

dcarney-stripe commented May 31, 2019

mattklein123 commented Jun 3, 2019

skydoctor commented Jun 12, 2019 • edited Loading

mattklein123 commented Jun 12, 2019

dcarney commented Jun 12, 2019 • edited Loading

skydoctor commented Jun 12, 2019

skydoctor commented Jun 17, 2019

alyssawilk commented Jun 17, 2019

dcarney commented Jun 17, 2019 • edited Loading

mattklein123 commented Jun 17, 2019

alyssawilk commented Jun 17, 2019

skydoctor commented Jun 17, 2019 • edited Loading

tatsuhiro-t commented Jun 18, 2019

alyssawilk commented Jun 18, 2019

skydoctor commented Jun 25, 2019

dcarney-stripe commented May 23, 2019 •

edited

Loading

skydoctor commented Jun 12, 2019 •

edited

Loading

dcarney commented Jun 12, 2019 •

edited

Loading

dcarney commented Jun 17, 2019 •

edited

Loading

skydoctor commented Jun 17, 2019 •

edited

Loading