Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC-Go should drop received headers containing upper ASCII (0x80+) characters #5318

Open
marooou opened this issue Apr 26, 2022 · 13 comments
Open
Labels
Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. P2 Type: Bug

Comments

@marooou
Copy link

marooou commented Apr 26, 2022

A have a question about a new version: v1.46.0 released on Apr 22.
The release adds a client-side validation of HTTP-invalid metadata before attempting to send (PR #4886).

One of the validation rules states that the gRPC request header value must contain one or more characters from the set [%x20-%x7E] (ref code) effectively excluding non-printable characters.

As a result, in some cases, bumping the package from v1.45 to 1.46 may become a breaking change
whenever header values pass on names and localized content (e.g. : Łukasz, Renée, Noël).

Was that an oversight or fully intentional to comply with RFC7230?

@dfawley
Copy link
Member

dfawley commented Apr 26, 2022

The change was intended to bring us in line with the HTTP spec, but I think there is a bug in the enforcement. RFC7230 says:

     quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
     qdtext         = HTAB / SP /%x21 / %x23-5B / %x5D-7E / obs-text
     obs-text       = %x80-FF

0x80-FF seems to be there to allow for unicode, but we are rejecting it.

@dfawley dfawley self-assigned this Apr 26, 2022
@marchmiel
Copy link

marchmiel commented Apr 26, 2022

@dfawley Sounds reasonable to fix, I think. Is there anything I can help with to have that fixed?

@dfawley dfawley assigned menghanl and dfawley and unassigned dfawley and menghanl Apr 27, 2022
@dfawley
Copy link
Member

dfawley commented Apr 27, 2022

I did a little more digging into this. It looks like the gRPC spec itself forbids this:

https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#requests

Custom-Metadata → Binary-Header / ASCII-Header
ASCII-Header → Header-Name ASCII-Value
ASCII-Value → 1*( %x20-%x7E ) ; space and printable ASCII

Also, gRPC-Java for sure will have trouble with these values (I'm not sure about C++/other implementations). The only reason it worked for go->go servers and clients is that both implementations assumed UTF-8, which would not be the case in other languages.

For this reason, I believe it is a bug that we were allowing these to be used before, and that the validation, as added is mostly working as intended. I do apologize for the breakage, though, which was unexpected because we did not realize that the underlying HTTP framer implementation was accepting unicode values.

Instead of transmitting the strings directly, it is recommended to percent-encode them first (and decode them on the receiving side).

However, we do still have a bug. The spec goes on to say:

The ASCII-Value character range defined is more strict than HTTP. Implementations must not error due to receiving an invalid ASCII-Value that's a valid field-value in HTTP, but the precise behavior is not strictly defined: they may throw the value away or accept the value. If accepted, care must be taken to make sure that the application is permitted to echo the value back as metadata. For example, if the metadata is provided to the application as a list in a request, the application should not trigger an error by providing that same list as the metadata in the response.

This means that on the receiving side, we also need to drop these headers, as we have no way to distinguish between metadata received over the wire and metadata created by the application.

@win5do
Copy link

win5do commented Jul 4, 2022

Any update on this?

@jonathanrhodes
Copy link

Yeah, I just got burned by this too. Whether the original behavior was a bug or not, introducing a breaking change like this is not a small thing.

(If we're going to start silently dropping the headers on the server-side too, I'm concerned that will also lead to more confusion.)

@dfawley dfawley added P2 and removed P1 labels Nov 28, 2022
@dfawley dfawley removed their assignment Nov 28, 2022
@Patrick0308
Copy link
Contributor

Patrick0308 commented Dec 2, 2022

I'm sorry. Although I implement this validating, I getting burned by this too now. We didn't anticipate unicode is being widely used on header value.
Should do we not allow %x80-FF? Because the http RFC7230 allow it and many users pass unicode value. May be we need to modify grpc rfc?
And I agree with @jonathanrhodes. If we not allow %x80-FF, dropping invalid header on the receiving side. This covert break change is confusing.

@chowyu12
Copy link

i have the same problem, does not have a better solution

@viggin543
Copy link

any solution found besides downgrading?

@dfawley
Copy link
Member

dfawley commented Jul 7, 2023

It should be possible to metadata key names ending with -bin, and gRPC will base64 encode/decode them on the wire.

ZStriker19 added a commit to DataDog/dd-trace-py that referenced this issue Sep 8, 2023
We ran into situations where public clients were sending HTTP requests
with unsupported non-ascii characters in the `x-datadog-origin` headers.
These headers were being propagated, via the
[`HTTPPropagator.extract`](https://github.com/DataDog/dd-trace-py/blob/fd0820b9d7b561d6b75b75c6865a5b3b5792054d/ddtrace/propagation/http.py#L835)
and [gRPC client
interceptor](https://github.com/DataDog/dd-trace-py/blob/c5dfcec45ae21483171c0ac148f94e61ade0b4d2/ddtrace/contrib/grpc/client_interceptor.py#L203),
to branching gRPC calls. The gRPC calls are then failing on the client
side due to gRPC client side validation which [forbids metadata values
with non-ascii](grpc/grpc-go#5318) characters.

The proposed solution here will prevent these unsupported headers values
from being propagated by preventing them from being set within
`Context`. This approach may impact HTTP propagation to branching HTTP
requests, but my understanding is that non-ascii characters are not
supported in these situations either.

## Checklist

- [x] Change(s) are motivated and described in the PR description.
- [x] Testing strategy is described if automated tests are not included
in the PR.
- [x] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

---------

Co-authored-by: ZStriker19 <zach.groves@datadoghq.com>
Co-authored-by: Zachary Groves <32471391+ZStriker19@users.noreply.github.com>
@dfawley dfawley changed the title Unicode characters no longer supported in grpc header values since v1.46.0 gRPC-Go should drop received headers containing upper ASCII (0x80+) characters Sep 18, 2023
@bananacocodrilo
Copy link

Hey @dfawley

I'm currently facing some problems due to this "If accepted, care must be taken to make sure that the application is permitted to echo the value back as metadata."

Do you know if there are any plans to get a fix for this?

On the same topic:

  • I've noticed that "ASCII-Value should not have leading or trailing whitespace" is also not being validated.
    This one is not so problematic, just mentioning it in case it's not in anyone's radar.
  • Also, just for reference, I've opened a similar issue in the Java implementation (Metadata: Header value validation grpc-java#11679). It might be worth tackling both issues together to reach a common place between both.

@dfawley
Copy link
Member

dfawley commented Nov 14, 2024

@bananacocodrilo,

Do you know if there are any plans to get a fix for this?

We aren't scheduling this work right now, and I'm a bit nervous about such a behavior change, too. The only thing we can really do, given our API in Go, to follow the gRPC spec, is to start removing invalid incoming metadata. It's unclear to me whether this would be better or worse than the current state. Perhaps we need a new metadata API so that we can accept incoming invalid metadata but not permit the addition of new instances?

@ejona86
Copy link
Member

ejona86 commented Nov 19, 2024

0x80-FF seems to be there to allow for unicode

NOPE! It was there for Latin-1. Latin-1 was the only RFC-allowed use, but... Anyway, the "obs" part of the HTTP spec stands for "obsolete" and shouldn't be used.

From RFC 7230:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

And yes, that means if you chose to support obs-text you have to define header values as bytes/bytestring in Java, Go, Python 3, and every other Unicode-enabled language. Basically nobody used RFC2047 for HTTP much; it looked like=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>.

@ejona86
Copy link
Member

ejona86 commented Nov 19, 2024

I'll also note quoted-string isn't what you should be looking at; that applies to only certain headers. You should be looking at field-content (obs-fold is only important to HTTP/1, so you can ignore it). VCHAR = %x21-%x7E

     header-field   = field-name ":" OWS field-value OWS

     field-name     = token
     field-value    = *( field-content / obs-fold )
     field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
     field-vchar    = VCHAR / obs-text

@purnesh42H purnesh42H added the Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. label Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. P2 Type: Bug
Projects
None yet
Development

No branches or pull requests