Move field validation text around #936

martinthomson · 2021-08-23T05:01:48Z

This just moves the reference to the canonical definition of what is
valid up to the top of the section. That's the canonical text.

This emphasizes the point that HTTP/2 is placing additional
requirements on endpoints with respect to validation.

This does not really fix #902 in the sense that it leaves validation of
DQUOTE and friends to the core semantics implementations.

Note that any "MAY" requirement for rejecting a messaging can have the
desired effect on those generating those messages: if an endpoint puts
DQUOTE in a field name, that message will have little hope of being
successfully handled.

Closes #902.

This just moves the reference to the canonical definition of what is valid up to the top of the section. That's the canonical text. This emphasizes the point that HTTP/2 is placing *additional* requirements on endpoints with respect to validation. This does not really fix httpwg#902 in the sense that it leaves validation of DQUOTE and friends to the core semantics implementations. Note that any "MAY" requirement for rejecting a messaging can have the desired effect on those generating those messages: if an endpoint puts DQUOTE in a field name, that message will have little hope of being successfully handled. Closes httpwg#902.

gregw · 2021-08-23T08:20:19Z

draft-ietf-httpbis-http2bis.xml

+          </t>
+          <t>
+            <xref target="HTTP" section="5.1"/> and <xref target="HTTP" section="5.5"/> define field
+            names and values and what characters are valid.  A recipient MAY treat a message that


I don't think that MAY does enough to satisfy #902.
I think this paragraph should be the first requirement in the list below and say that in H2 a field MUST be treated as malformed if it is invalid in HTTP - unless it is a pseudo header. Specifically we should say:

HPACK can carry non valid fileds.
All H2 impls MUST validate as follows:

must be a valid HTTP field or a psuedo field

names must be lower case

must not contain zero value etc.

must not start or end with whitespace

reschke · 2021-08-23T08:35:48Z

My 2 cents: it seems that there's disagreement about whether the validity requirements from the core specs need to be repeated.

Of course, DRY. Semantics currently says:

A sender MUST NOT generate protocol elements that do not match the grammar defined by the corresponding ABNF rules. Within a given message, a sender MUST NOT generate protocol elements or syntax alternatives that are only allowed to be generated by participants in other roles (i.e., a role that the sender does not have for that message).

The issue here is that H/2 actually extends that ABNF (without doing that as ABNF), which makes the link somewhat weak.

Maybe formally (==ABNF) defining "h2-fieldname" as "lowercase-http-fieldname / h2-pseudofield" would be helpful.

reschke · 2021-08-23T08:51:21Z

So the proposal would be:

  h2-fieldname   = 1*fchar / ":" 1*fchar

  fchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
                 / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
                 / DIGIT / %x61-7A
                 ; like "token" in [HTTP], expect for uppercase ALPHA

gregw · 2021-08-23T08:56:42Z

Julian, YES PLEASE for ABNF (yeah I know that is editorial, but it makes it so much more concise and exact). I think everybody agrees that a h2 impl MUST NOT generate invalid HTTP fields. The differences appear on the receiving side as the current document allows that h2 impls MAY accept some invalid HTTP fields. I've never understood the use-case for this? cheers

…

On Mon, 23 Aug 2021 at 18:51, Julian Reschke ***@***.***> wrote: So the proposal would be: h2-fieldname = 1*fchar / ":" 1*fchar fchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / %x61-7A ; like "token" in [HTTP], expect for uppercase ALPHA — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#936 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAARJLIYEPII5ZHALGRDC3DT6IDZHANCNFSM5CTYYUKA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

-- Greg Wilkins ***@***.***> CTO http://webtide.com

reschke · 2021-08-23T09:04:45Z

Well, AFAIU, we do not have a hard requirement on recipients in the base spec (and HTTP/1 either):

Unless noted otherwise, a recipient MAY attempt to recover a usable protocol element from an invalid construct.

So this seems consistent with what the base spec says.

Lukasa · 2021-08-23T11:47:40Z

I have no objection to adding ABNF to define H2’s fields. I would say that we should keep the prose, as the prose does a good job of expressing the why of our choices: for example, explaining the expectations around lowercase field names.

However, I agree with @reschke and my past self that we do not need add a “MUST validate” section.

This is a) about framing this as a minimal set of checks, b) establishing the stakes (request smuggling), and then c) not getting wrapped up in restating core semantics requirements.

wtarreau · 2021-08-24T03:58:28Z

Thanks Martin. I'm having a bit of difficulty parsing this sentence: "These checks are the minimum necessary to avoid accepting messages that might avoid security checks". Maybe "These checks are the minimum necessary to forward or process the message as an HTTP message" ?

Also "A recipient can treat a message ..." why not "s/can/SHOULD" ? I think that should be enough to solve it.

martinthomson · 2021-08-24T04:54:37Z

My objection to ABNF is the same as before: we risk that being mistaken as defining what is valid. Something that meets that ABNF is not valid. The point of this section is to define what the absolute minimum is with respect to any form of message handling. Our attempt to require more complete validation roundly failed, so this is where we end up.

Willy, I've tried again with the wording. But I want to avoid normative statements regarding the core semantics documents. Those already define what a receiver might do. Those requirements aren't strict enough, at least in the "acting as a tunnel" case, for HTTP/2. The point of this text is to point to HTTP and note that these checks won't guarantee you a valid message, but to allow those implementations that do apply those checks the option of using HTTP/2 signaling (reset streams) to indicate the error condition. I don't think that needs a SHOULD, but I'm OK with adding it.

gregw · 2021-08-24T05:23:04Z

The point of this section is to define what the absolute minimum is with respect to any form of message handling.

@martinthomson I think this is a flawed approach. By only defining a subset of invalid fields we create a set of fields that are not valid HTTP but may be valid H2 fields (by merit of being not defined as minimally invalid - just describing it is convoluted).

I have yet to see any use-case for a field to be valid in H2 but invalid in HTTP? #902 has identified that the example we have frequently used for such a field (double quote in name) will be a concern if passed onto CGI scripting language. So we have no known uses for such fields and security concerns about at least some of them. So why not just define valid H2 fields as a proper subset of valid HTTP fields plus pseudo fields?

we risk that being mistaken as defining what is valid

We can avoid defining what is valid by referencing other specifications (which do use ABNF). As #902 suggests a field should be valid in H2 if and only if it is:

( valid-HTTP-field || pseudo-field ) && lower-case-name

I.e. the set of valid H2 fields is a proper subset of valid HTTP fields plus pseudo fields.

martinthomson · 2021-08-24T05:36:20Z

That is what RFC 7540 did. That failed. We can revert all of these changes (aside from editorial tweaks) and go back to what the previous version said, but would that be any more successful?

gregw · 2021-08-24T06:18:32Z

That is what RFC 7540 did. That failed. We can revert all of these changes (aside from editorial tweaks) and go back to what the previous version said, but would that be any more successful?

I take it that you consider RFC7540 failed because of #815 which ultimately references httpwg/http-core#683. These indicate that there are cases of CTL characters in field values observed in the wild.

The response has been to relax the validation on fields so much that now security problems have been created. I think this is because the minimal validation approach is essentially trying to be secure with a black list (in this case a list of known bad characters). It is far more secure to use a white list approach (i.e what is the minimal character set required to carry the HTTP semantic ).

I don't really understand why no attempt was made to enforce the previous specification - I really hope it was not because of who was violating. But if enforcing the spec is seen as impossible, then rather than just leave the validity of fields somewhat undefined, surely a better approach would have been to make some explicit exclusions to allow just the needed control characters in field values. Or perhaps field names must be valid HTTP field names (or psuedo and lowercase), but h2 is more relaxed on field values? Deserting the field and not actually precisely defining what is valid just seams like an inviation to create more special cases and more unforseen security problems.

wtarreau · 2021-08-24T07:11:56Z

The real reason for this situation is that we all know that what is extracted from H2 then HPACK will need further processing and that this processing may already be fooled by the contents extracted from there. In addition, H2 being a standard multiplexed protocol on top of TCP is abused by some not really interested in the HTTP aspect of it (e.g. https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md) , putting intermediaries at risk of breakage and giving them even more motivation for dropping essential controls and becoming insecure.

While I would prefer a wording directly referencing the HTTP spec, like Martin I'm convinced that some will simply not even read it.

So I think the proposed addition here can be welcome provided that it does not use the terms "valid" or anything else that might make implementations think "OK that's sufficient", and that presenting the root cause and the risk remains useful to convince those who are hesitating (or at least give them good arguments for not disabling the tests).

Let's try again:

   HTTP/2 and HPACK being length-delimited, they present no restriction to the sets of characters they may convey, exposing
   recipients to character sets that may result in incorrect parsing or mismatched translation to other HTTP versions.
   At the very least, before attempting to parse these fields as HTTP header fields or forwarding them, any implementation
   SHOULD perform the following safety checks, which are not sufficient to declare the fields as valid HTTP fields, but are
   necessary to protect HTTP parsing. [HTTP 5.1/5.5] defines what is valid for HTTP.

It's not far from what Martin wrote in the last update, it just tries to better present the context and mention that this has to be done before HTTP parsing.

And I'd like to use SHOULD for treat as malformed.

Lukasa · 2021-08-24T11:15:17Z

I don't really understand why no attempt was made to enforce the previous specification - I really hope it was not because of who was violating

No attempt was made to enforce the previous specification because the IETF and httpwg have no enforcement mechanism. What could have been done to enforce it?

W.r.t. this patch, I agree with @martinthomson that it seems that the conversation has come full-circle to where we were with RFC 7540. In that circle I think the wording here is closest to what I’d want: a clear list of specific rules with a note that these rules are a lower bound on processing, not an upper bound.

gregw · 2021-08-24T14:46:26Z

So I think the proposed addition here can be welcome provided that it does not use the terms "valid" or anything else that might make implementations think "OK that's sufficient", and that presenting the root cause and the risk remains useful to convince those who are hesitating (or at least give them good arguments for not disabling the tests).

I agree that misunderstanding the safety of the current spec is a real problem, but I don't think avoiding the word "valid" will help. Many developers will naturally think that a HTTP/2 specification would only support valid HTTP fields. You can't fix this in the text because that assumption has already been made in existing software and will be made again without reading the text. Even those that do read the text might get lost in the fine distinctions we are trying to make here between a field that is valid and one that may not be invalid.

At the very least, before attempting to parse these fields as HTTP header fields or forwarding them, any implementation
SHOULD perform the following safety checks, which are not sufficient to declare the fields as valid HTTP fields, but are
necessary to protect HTTP parsing. [HTTP 5.1/5.5] defines what is valid for HTTP.

What is a safe field? If the safety checks are not sufficient for consuming HTTP semantics, what are H2 fields safe for? What layers can consume them without extra validation? Why are consumers protected against CR, LF and ':' attacks but they are not safe from DQUOTE ones? Fields passing these safety checks cannot not be passed to any existing layers that are written with the assumption that they will only receive valid HTTP fields and new user will wrongly assume that H2 safety checks are at least as good as those in HTTP/1.

RFC7540 said that an imple MUST be invalid HTTP fields as malformed, but some impls ignored that. So how about we say that impls SHOULD treat invalid HTTP fields as malformed and then define the specific circumstances in which consenting peers can exchange invalid HTTP fields, perhaps indicating as much with a SETTING or new psuedo header that is needed to specifically allow h2 framing to be used for non-HTTP semantics? Yeah I know some will say "you can't force those users of invalid HTTP fields to do something special". So instead we make all users of valid HTTP fields need to do extra validation else risk being insecure?

Anyway, sorry for re-re-re-re-litigating. My live-with criteria is that an implementation MAY treat invalid HTTP fields as malformed, so in that sense this PR is fine.... it just doesn't fix #902 (and Martin says as much), so this is really just an editorial PR unrelated to #902.

wtarreau · 2021-08-24T16:13:39Z

Greg, I'm not suggesting that an H2 field is safe, but "safe to be parsed as HTTP". The main reason is that some basic HTTP components have been broken into pieces in H2 and need to be reassembled before being processed by HTTP. :scheme, :authority, :path are two such examples. In HTTP together they're called a URI. Cookie is another one. Some H2-specific issues may happen at the H2 layer that are impossible by definition in HTTP since HTTP/1. In my opinion this is the trouble that is being attempted to be addressed here. And yes it turns out that it diverges a bit from #902 though it can address it if it mentions that consumed fields are subject to these checks.

For sure, if we enforce strict validation of all H2 fields against HTTP ones it will address everything, except that we've already seen that implementors are not keen on switching between many specs at once and need some guidance. It's a delicate balance. Julian proposed to duplicate the ABNF there, I thought it was a good idea. And if we don't want to make it look normative we could write the minimum elements as a reminder.

gregw · 2021-08-25T00:33:04Z

I'm not suggesting that an H2 field is safe, but "safe to be parsed as HTTP".

But the spec currently as currently written and after this PR is not "safe to be parsed as HTTP". If that parsing is done by a scripting language then DQUOTE and other control characters that are current permitted by the spec may well not be safe. To take your example of a cookie header, how can you be sure that code which is currently safely the cookie header value (or set-cookie value) will be robust in the face of control characters within that value?

Isn't the set of h2 fields that are "safe to be parsed as HTTP" a true subset of valid HTTP fields? What extra characters can we allow into fields that are not valid HTTP that we are sure will be parsed safely as valid HTTP by all existing layers currently running on top of H2 implementations?

The intent of these changes have been to normalize the practise in the wild of h2 framing being used to carry control characters in fields for non-HTTP purposes. But if we nomarlize that, we are putting in jeopardy all existing code written to HTTP semantics that may not even be aware of the version of HTTP carrying the fields that they are interpretting.

Surely there is some other way to allow h2 framing to be used to carry non-HTTP streams without putting all existing HTTP code at risk? If we don't want developers abusing the current protocol, then give them a proper solution for their use-case rather than corrupt the existing primary use-case.

wtarreau · 2021-08-25T03:00:36Z

I'm not suggesting that an H2 field is safe, but "safe to be parsed as HTTP".

But the spec currently as currently written and after this PR is not "safe to be parsed as HTTP". If that parsing is done by a scripting language then DQUOTE and other control characters that are current permitted by the spec may well not be safe.

They should represent the exact same level of risk as having the same code parse HTTP/1.
But actually that might in fact be what we're all seeking without precisely realizing it, which is making sure an H2 message can be at least reconstructed as an H1 message that is safe to parse. The rules on CR/LF precisely have always been a concern to all of us for this.

To take your example of a cookie header, how can you be sure that code which is currently safely the cookie header value (or set-cookie value) will be robust in the face of control characters within that value?

Same as above.

Isn't the set of h2 fields that are "safe to be parsed as HTTP" a true subset of valid HTTP fields? What extra characters can we allow into fields that are not valid HTTP that we are sure will be parsed safely as valid HTTP by all existing layers currently running on top of H2 implementations?

This is always difficult to say if we'd try to extend what HTTP permits. However we're certain that some will almost always be dangerous along a chain or cause parsing issues even inside implementations.

The intent of these changes have been to normalize the practise in the wild of h2 framing being used to carry control characters in fields for non-HTTP purposes. But if we nomarlize that, we are putting in jeopardy all existing code written to HTTP semantics that may not even be aware of the version of HTTP carrying the fields that they are interpretting.

Not exactly. This is actually the current situation probably because some consider that the effort needed to comply with the rules in semantics is too high for little value the this level. This is something I can understand. For example in haproxy we have a semantics layer which performs a lot of controls, deduplicates content-length, checks Host against :authority etc. But at the lowest level it is difficult to enforce such checks when you're just converting header lists to internal messages. However we already have the NUL/CR/LF checks that allows the upper layer parser to proceed (the initial implementation used to build an H1 message but that's no longer the case).

Thus maybe we should replace all this with an approach centered around this tricky HTTP/1 translation, which also does not infer how implementations should work internally nor suggest to put anything non-HTTP inside HEADERS frames:

HTTP/2 and HPACK being length-delimited, they present no restriction to the sets of
characters they may convey, exposing recipients to character sets that may result in
incorrect parsing or mismatched translation to other HTTP versions. Implementations
which reassemble elements as an HTTP/1-like message before processing semantics are
particularly at risk and MUST perform the following safety checks before proceeding:
  - CTL, colon, upper case, 0x80-0xFF in header field names
  - NUL, CR, LF, leading/trailing LWS in field values
  - non-token chars in :method
  - chars out of alnum/+/- in :scheme
  - NUL,CR,LF,space in :authority
  - :path not starting with '/' or '*'
These safety checks are the bare minimum that permits a reconstructed message to be
safely submitted to a compliant HTTP/1 parser but are not sufficient to declare the
fields as valid HTTP fields, which is covered by [HTTP 5.1/5.5]. Implementations
SHOULD reject messages carrying invalid HTTP fields as malformed.

Surely there is some other way to allow h2 framing to be used to carry non-HTTP streams without putting all existing HTTP code at risk? If we don't want developers abusing the current protocol, then give them a proper solution for their use-case rather than corrupt the existing primary use-case.

Just to be clear, I really do not want to see HEADERS frame being abused to carry non-HTTP because I know pretty well what it's like to be an intermediary developer whose product is pointed the finger at for breaking stuff when inserted in a working but non-compliant chain. This is also why I would like to see the SHOULD reject as malformed.

gregw · 2021-08-25T05:56:49Z

HTTP/2 and HPACK being length-delimited, they present no restriction to the sets of
characters they may convey, exposing recipients to character sets that may result in
incorrect parsing or mismatched translation to other HTTP versions. Implementations
which reassemble elements as an HTTP/1-like message before processing semantics are
particularly at risk and MUST perform the following safety checks before proceeding:

I like this paragraph and I'm totally OK with a spec that has conditions like "implementations which reassemble elements as an HTTP/1-like message before ...."

However, it is not just HTTP/1 as there are many HTTP semantic layers that are unaware of the protocol version, so it needs to be any impl that represents streams as HTTP messages must do specific validation whilst the h2 "abusers" are free use streams to the limites of HPACK capabilities, so long as they don't represent the results as HTTP messages - either on the wire or to layers above.

Furthermore an impl that is representing HTTP messages cannot just do the minimal safety checks you listed. It MUST validate the fields against the general syntax of HTTP fields. For example, if I provide a h2 implementation of the Java HttpExchange class, then my impl MUST validate the fields against HTTP syntax. But if instead, I use my h2 impl to implement a gRPC API, then I'm under no such obligation as I'm not representing HTTP messages.

More over, it is only the protocol impl that can do this validation. Another example is that my server is used on PAAS platforms where the application was deployed decades ago and the source code is probably lost and the dev team all retired. The app cannot be redeployed and it certainly cannot be reviewed for how it parses the Cookie header. It's not sufficient for me to switch them from HTTP/1 to H2 and in the process expose those applications to non-valid HTTP headers in HTTP messages I deliver to them. If the h2 impl is not going to validate the fields are valid HTTP then who is?

So how about something like:

HTTP/2 and HPACK present few restrictions on the characters they may convey, exposing recipients to character sets that may 
result in incorrect handling or mismatched translation to other HTTP versions.  Implementations which representing streams as HTTP 
messages are particularly at risk and MUST validate the generic syntax of fields as defined in <xref target="HTTP" section="5.5"/>.

wtarreau · 2021-08-25T06:48:46Z

The thing is, stacks that support multiple versions will certainly not implement the full header syntax in each version because that is exactly what ought to be done at the semantics layer and why the HTTP spec was split like this as well. We really have messaging (H1,H2,H3) and semantics. The transport explains how to extract elements from a byte stream and how to encode them into a byte stream. The semantics details their syntax and consistency.

NUL/CR/LF purely come from the H1 world but H1 was HTTP before H2 existed, so it could be argued that they are legacy limitations that probably affects most stacks. The cookie header splitting is purely H2, and once recombined it must parse correctly according to the semantics definition. But whether your cookie header field comes from H1/H2/H3, it will have to pass through the exact same validity checks.

What is certain is that these HTTP syntax checks must be performed somewhere. But the messaging layer is not always exactly the best place for this. Typically an H1 implementation will not have code to check for embedded LF characters because it's a delimiter, while H2 needs to have explicit check against this. The rules on the resulting cookie header field format apply after Cookie header reassembly, not before. And this reassembly is specific to the messaging layer which defines how to serialize fields. Last, the generic rules are not sufficient to safely reassemble the pseudo-header fields.

Maybe in the end we could combine our two parts, starting with yours to indicate what every HTTP implementation must do, and putting the focus on the extreme care that is required when translating to HTTP/1-like since we know that it's one of the most natural approaches (and even used in the examples in the spec).

This could give roughly:

HTTP/2 and HPACK present no restriction on the characters they may convey, exposing recipients
to character sets that may result in incorrect handling or mismatched translation to other HTTP
versions.  Implementations which represent HTTP messages as byte streams or series of lines are
particularly at risk and MUST validate the generic syntax of fields as defined in
<xref target="HTTP" section="5.1"/> and <xref target="HTTP" section="5.5"/>. Implementations that
support multiple HTTP versions are specifically warned against the risk of possibly insufficient
filtering of the CR and LF characters in control code originally developed for HTTP/1.x.

In addition, special care is required when attempting to rebuild a start line from the
following pseudo-header fields values, that may also be abused to affect the final message
representation:
  - presence of NUL/CR/LF characters in any pseudo-header field value
  - presence of characters not matching token production in :method
  - presence of chars outside of ALPHA/DIGIT/+/-/. in :scheme
  - presence of LWS characters in :authority
  - :path not starting with '/' or '*'
  - presence of characters outside of DIGIT in :status

With this I think it that the intent is clear, the guidance as well, it is exhaustive and doesn't leave any doubt on what needs to be done.

wtarreau · 2021-08-25T06:52:43Z

Oh, and of course:

Implementations SHOULD reject messages carrying invalid HTTP fields as malformed.

:-)

gregw · 2021-08-25T08:29:25Z

I think extreme care is need anytime a HTTP message is represented, not just when represented as a HTTP/1. If a H2 layer is delivering messages to a semantic layer that was developed against h1, then it will not be assuming HTTP valid charset in no matter if the message is passed as byte streams or lines or XML or JSON or Strings in classes. It seams wrong for one particular messaging layer (h2) to require changes in transport impartial HTTP semantic layers so the messaging layer can be "abused" for non HTTP purposes?

As you say, the actual syntax validations needed to be done are dependent on the messaging layer. So they are best done in the messaging layer as it knows what it is and can sometimes do things efficiently like checking charsets during parsing. If the syntax is not validated by the messaging layer, then the semantic layer will have to do full validation, which will result in wasting CPU duplicating any checks that are done by the messaging layer (or are intrinsic to the transport).

wtarreau · 2021-08-26T06:25:57Z

I think I'm OK with this. There's just one minor thing:

where characters such as CR and COLON

It's actually "LF and COLON", but to be exhaustive I'd write "CR, LF, and COLON" since we've always kept extreme care on CR due to older H1 implementations.

I think we really want to say something about the pseudo headers. They do not exist in HTTP/1 and the recent portswigger report clearly shows that many of us were relying on the tests performed on the reassembled start line, but that was too late (in my case everything was already NUL/CR/LF clean but LWS were not imagined there). I think it would be appropriate to place the small enumeration I mentioned above somewhere (probably after the "minimal validation" part). But it's possible that it doesn't have its place in the headers section and that it would be better discussed in the security considerations. And maybe then the whole "minimal validation" part can be moved there with it, since after all, it's what all this is about, covering risks that are known for having already been abused.

If you're interested I can propose a whole paragraph about all this, as being an implementer makes it easy enough for me to describe the traps. I'm just still having difficulties with all the toolchain and the XML docs which is why I don't send PRs.

martinthomson · 2021-08-26T06:34:32Z

(I'll add LF)

As for pseudo-headers, welcome back scope creep. Would a simple prohibition on SP (and maybe HTAB) for pseudo-headers work? I'm fairly confident that the %20 substitution is performed as needed, so there are no valid uses for the character. Given that that is the real problem, would that do? I don't feel like a full enumeration is going to be effective, but just one more simple bullet might not hurt.

wtarreau · 2021-08-26T06:45:41Z

I've initially thought about it when fixing my bugs ("if I had filtered LWS as well..."). But that's not enough for :scheme nor :path. If :path starts with any character that is valid in a domain name, or a colon, you can change the authority when concatenating values. E.g.

:authority: example.org
:path: .example.com/index.html

or

:authority: example.org
:path: :4443/index.html

Would result in `https://example.org.example.com/index.html" or "https://example.org:4443/index.html". Similarly if ":scheme" contains a colon, we're lost.

In haproxy we've seriously hardened this part so that we don't have to deal with a variant of it any time soon. But I think that the absolute bare minimum is:

no NUL/CR/LF/LWS in any of the currently known pseudo header fields
no colon in :scheme
nothing but "/" or "*" as the first char of :path

I don't see how to fool a start line parser when reassembling something based on such minimum controls, but I do see how to fool them by relaxing any single one of the rules above.

gregw

This is really good other than a couple of niggles

Expressing it this way makes it really clear the difference between recommended full HTTP validation and the minimal validation that must be imposed if full validation is not done.

draft-ietf-httpbis-http2bis.xml

gregw · 2021-08-26T07:07:16Z

draft-ietf-httpbis-http2bis.xml

+              A field name MUST NOT contain characters in the ranges 0x00-0x20, 0x41-0x5A, or
+              0x7F-0xFF (all ranges inclusive).  This specifically excludes all non-visible ASCII
+              characters, ASCII SP (0x20), and uppercase characters ('A' to 'Z', ASCII 0x41 to
+              0x5a).


Should DQUOTE and single quote also be listed here to protect scripting languages?

The references to HTTP were busted. I fixed that. More importantly, these extra checks are not optional, as we require that uppercase names be rejected. I've made them directly mandatory, but noted that only the uppercase name check is needed if HTTP validation is performed.

mnot · 2021-09-01T08:09:31Z

Editors, it looks like this is ready to ship; anything stopping that?

martinthomson · 2021-09-01T12:15:54Z

Only that discussion was continuing. But it seems to have settled now.

wtarreau · 2021-09-01T13:00:23Z

Thanks Martin. What about adding the point I made above in #936 (comment) for :scheme, :path and :method which need further protection (that's not addressed in core since purely H2) ? I could propose this text (yeah I know my formulations are not always great):

In addition, implementations that operate on a URI or request line reconstructed from the concatenation
of :method, :scheme, :authority, and :path SHOULD validate each of these fields individually and according
to the rules in RFC3986#3 and MUST at least perform this minimal validation:
- none of the aforementioned pseudo-header field values contains any character among NUL/CR/LF/LWS
- :scheme does not contain the COLON character
- :path exclusively starts with "/" or "*"
Failure to do so exposes the implementation to risks of request smuggling attacks or authority splitting.

martinthomson added 5 commits August 23, 2021 14:53

Try again

6510c73

Use a more natural word ordering

0485bfe

s/endpoint/implementation/

2305e05

Exclude rather than include to avoid implication of allowing these

836556a

gregw reviewed Aug 23, 2021

View reviewed changes

martinthomson added 2 commits August 24, 2021 11:19

Take a version of Willy's suggestions

ba9de40

This is a) about framing this as a minimal set of checks, b) establishing the stakes (request smuggling), and then c) not getting wrapped up in restating core semantics requirements.

Amendment

0d668c9

martinthomson added 2 commits August 24, 2021 14:47

Reword again

45b78ef

These don't guarantee that fields are valid

ba11cbf

martinthomson added 2 commits August 26, 2021 15:35

OK, let's try this again

be36c53

Merge branch 'main' into field-validation-mkIII

4341e0c

Add LF

40cbfb7

gregw approved these changes Aug 26, 2021

View reviewed changes

Lukasa approved these changes Aug 26, 2021

View reviewed changes

martinthomson merged commit bd46e3e into httpwg:main Sep 1, 2021

martinthomson deleted the field-validation-mkIII branch September 1, 2021 12:15

wtarreau mentioned this pull request Jan 12, 2022

pseudo-header stricter validation #1015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move field validation text around #936

Move field validation text around #936

martinthomson commented Aug 23, 2021

gregw Aug 23, 2021

reschke commented Aug 23, 2021 •

edited

Loading

reschke commented Aug 23, 2021

gregw commented Aug 23, 2021 via email

reschke commented Aug 23, 2021

Lukasa commented Aug 23, 2021

wtarreau commented Aug 24, 2021

martinthomson commented Aug 24, 2021

gregw commented Aug 24, 2021

martinthomson commented Aug 24, 2021

gregw commented Aug 24, 2021

wtarreau commented Aug 24, 2021

Lukasa commented Aug 24, 2021

gregw commented Aug 24, 2021

wtarreau commented Aug 24, 2021

gregw commented Aug 25, 2021

wtarreau commented Aug 25, 2021 •

edited

Loading

gregw commented Aug 25, 2021

wtarreau commented Aug 25, 2021

wtarreau commented Aug 25, 2021

gregw commented Aug 25, 2021

wtarreau commented Aug 26, 2021

martinthomson commented Aug 26, 2021

wtarreau commented Aug 26, 2021

gregw left a comment

gregw Aug 26, 2021

mnot commented Sep 1, 2021

martinthomson commented Sep 1, 2021

wtarreau commented Sep 1, 2021

Move field validation text around #936

Move field validation text around #936

Conversation

martinthomson commented Aug 23, 2021

gregw Aug 23, 2021

Choose a reason for hiding this comment

reschke commented Aug 23, 2021 • edited Loading

reschke commented Aug 23, 2021

gregw commented Aug 23, 2021 via email

reschke commented Aug 23, 2021

Lukasa commented Aug 23, 2021

wtarreau commented Aug 24, 2021

martinthomson commented Aug 24, 2021

gregw commented Aug 24, 2021

martinthomson commented Aug 24, 2021

gregw commented Aug 24, 2021

wtarreau commented Aug 24, 2021

Lukasa commented Aug 24, 2021

gregw commented Aug 24, 2021

wtarreau commented Aug 24, 2021

gregw commented Aug 25, 2021

wtarreau commented Aug 25, 2021 • edited Loading

gregw commented Aug 25, 2021

wtarreau commented Aug 25, 2021

wtarreau commented Aug 25, 2021

gregw commented Aug 25, 2021

wtarreau commented Aug 26, 2021

martinthomson commented Aug 26, 2021

wtarreau commented Aug 26, 2021

gregw left a comment

Choose a reason for hiding this comment

gregw Aug 26, 2021

Choose a reason for hiding this comment

mnot commented Sep 1, 2021

martinthomson commented Sep 1, 2021

wtarreau commented Sep 1, 2021

reschke commented Aug 23, 2021 •

edited

Loading

wtarreau commented Aug 25, 2021 •

edited

Loading