-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move field validation text around #936
Move field validation text around #936
Conversation
This just moves the reference to the canonical definition of what is valid up to the top of the section. That's the canonical text. This emphasizes the point that HTTP/2 is placing *additional* requirements on endpoints with respect to validation. This does not really fix httpwg#902 in the sense that it leaves validation of DQUOTE and friends to the core semantics implementations. Note that any "MAY" requirement for rejecting a messaging can have the desired effect on those generating those messages: if an endpoint puts DQUOTE in a field name, that message will have little hope of being successfully handled. Closes httpwg#902.
draft-ietf-httpbis-http2bis.xml
Outdated
</t> | ||
<t> | ||
<xref target="HTTP" section="5.1"/> and <xref target="HTTP" section="5.5"/> define field | ||
names and values and what characters are valid. A recipient MAY treat a message that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that MAY
does enough to satisfy #902.
I think this paragraph should be the first requirement in the list below and say that in H2 a field MUST be treated as malformed if it is invalid in HTTP - unless it is a pseudo header. Specifically we should say:
HPACK can carry non valid fileds.
All H2 impls MUST validate as follows:
- must be a valid HTTP field or a psuedo field
- names must be lower case
- must not contain zero value etc.
- must not start or end with whitespace
My 2 cents: it seems that there's disagreement about whether the validity requirements from the core specs need to be repeated. Of course, DRY. Semantics currently says:
The issue here is that H/2 actually extends that ABNF (without doing that as ABNF), which makes the link somewhat weak. Maybe formally (==ABNF) defining "h2-fieldname" as "lowercase-http-fieldname / h2-pseudofield" would be helpful. |
So the proposal would be:
|
Julian,
YES PLEASE for ABNF (yeah I know that is editorial, but it makes it so much
more concise and exact).
I think everybody agrees that a h2 impl MUST NOT generate invalid HTTP
fields. The differences appear on the receiving side as the current
document allows that h2 impls MAY accept some invalid HTTP fields. I've
never understood the use-case for this?
cheers
…On Mon, 23 Aug 2021 at 18:51, Julian Reschke ***@***.***> wrote:
So the proposal would be:
h2-fieldname = 1*fchar / ":" 1*fchar
fchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / %x61-7A
; like "token" in [HTTP], expect for uppercase ALPHA
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#936 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAARJLIYEPII5ZHALGRDC3DT6IDZHANCNFSM5CTYYUKA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
--
Greg Wilkins ***@***.***> CTO http://webtide.com
|
Well, AFAIU, we do not have a hard requirement on recipients in the base spec (and HTTP/1 either):
So this seems consistent with what the base spec says. |
I have no objection to adding ABNF to define H2’s fields. I would say that we should keep the prose, as the prose does a good job of expressing the why of our choices: for example, explaining the expectations around lowercase field names. However, I agree with @reschke and my past self that we do not need add a “MUST validate” section. |
This is a) about framing this as a minimal set of checks, b) establishing the stakes (request smuggling), and then c) not getting wrapped up in restating core semantics requirements.
Thanks Martin. I'm having a bit of difficulty parsing this sentence: "These checks are the minimum necessary to avoid accepting messages that might avoid security checks". Maybe "These checks are the minimum necessary to forward or process the message as an HTTP message" ? Also "A recipient can treat a message ..." why not "s/can/SHOULD" ? I think that should be enough to solve it. |
My objection to ABNF is the same as before: we risk that being mistaken as defining what is valid. Something that meets that ABNF is not valid. The point of this section is to define what the absolute minimum is with respect to any form of message handling. Our attempt to require more complete validation roundly failed, so this is where we end up. Willy, I've tried again with the wording. But I want to avoid normative statements regarding the core semantics documents. Those already define what a receiver might do. Those requirements aren't strict enough, at least in the "acting as a tunnel" case, for HTTP/2. The point of this text is to point to HTTP and note that these checks won't guarantee you a valid message, but to allow those implementations that do apply those checks the option of using HTTP/2 signaling (reset streams) to indicate the error condition. I don't think that needs a SHOULD, but I'm OK with adding it. |
@martinthomson I think this is a flawed approach. By only defining a subset of invalid fields we create a set of fields that are not valid HTTP but may be valid H2 fields (by merit of being not defined as minimally invalid - just describing it is convoluted). I have yet to see any use-case for a field to be valid in H2 but invalid in HTTP? #902 has identified that the example we have frequently used for such a field (double quote in name) will be a concern if passed onto CGI scripting language. So we have no known uses for such fields and security concerns about at least some of them. So why not just define valid H2 fields as a proper subset of valid HTTP fields plus pseudo fields?
We can avoid defining what is valid by referencing other specifications (which do use ABNF). As #902 suggests a field should be valid in H2 if and only if it is:
I.e. the set of valid H2 fields is a proper subset of valid HTTP fields plus pseudo fields. |
That is what RFC 7540 did. That failed. We can revert all of these changes (aside from editorial tweaks) and go back to what the previous version said, but would that be any more successful? |
I take it that you consider RFC7540 failed because of #815 which ultimately references httpwg/http-core#683. These indicate that there are cases of CTL characters in field values observed in the wild. The response has been to relax the validation on fields so much that now security problems have been created. I think this is because the minimal validation approach is essentially trying to be secure with a black list (in this case a list of known bad characters). It is far more secure to use a white list approach (i.e what is the minimal character set required to carry the HTTP semantic ). I don't really understand why no attempt was made to enforce the previous specification - I really hope it was not because of who was violating. But if enforcing the spec is seen as impossible, then rather than just leave the validity of fields somewhat undefined, surely a better approach would have been to make some explicit exclusions to allow just the needed control characters in field values. Or perhaps field names must be valid HTTP field names (or psuedo and lowercase), but h2 is more relaxed on field values? Deserting the field and not actually precisely defining what is valid just seams like an inviation to create more special cases and more unforseen security problems. |
The real reason for this situation is that we all know that what is extracted from H2 then HPACK will need further processing and that this processing may already be fooled by the contents extracted from there. In addition, H2 being a standard multiplexed protocol on top of TCP is abused by some not really interested in the HTTP aspect of it (e.g. https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md) , putting intermediaries at risk of breakage and giving them even more motivation for dropping essential controls and becoming insecure. While I would prefer a wording directly referencing the HTTP spec, like Martin I'm convinced that some will simply not even read it. So I think the proposed addition here can be welcome provided that it does not use the terms "valid" or anything else that might make implementations think "OK that's sufficient", and that presenting the root cause and the risk remains useful to convince those who are hesitating (or at least give them good arguments for not disabling the tests). Let's try again:
It's not far from what Martin wrote in the last update, it just tries to better present the context and mention that this has to be done before HTTP parsing. And I'd like to use |
No attempt was made to enforce the previous specification because the IETF and httpwg have no enforcement mechanism. What could have been done to enforce it? W.r.t. this patch, I agree with @martinthomson that it seems that the conversation has come full-circle to where we were with RFC 7540. In that circle I think the wording here is closest to what I’d want: a clear list of specific rules with a note that these rules are a lower bound on processing, not an upper bound. |
I agree that misunderstanding the safety of the current spec is a real problem, but I don't think avoiding the word "valid" will help. Many developers will naturally think that a HTTP/2 specification would only support valid HTTP fields. You can't fix this in the text because that assumption has already been made in existing software and will be made again without reading the text. Even those that do read the text might get lost in the fine distinctions we are trying to make here between a field that is valid and one that may not be invalid.
What is a safe field? If the safety checks are not sufficient for consuming HTTP semantics, what are H2 fields safe for? What layers can consume them without extra validation? Why are consumers protected against CR, LF and ':' attacks but they are not safe from DQUOTE ones? Fields passing these safety checks cannot not be passed to any existing layers that are written with the assumption that they will only receive valid HTTP fields and new user will wrongly assume that H2 safety checks are at least as good as those in HTTP/1. RFC7540 said that an imple MUST be invalid HTTP fields as malformed, but some impls ignored that. So how about we say that impls SHOULD treat invalid HTTP fields as malformed and then define the specific circumstances in which consenting peers can exchange invalid HTTP fields, perhaps indicating as much with a SETTING or new psuedo header that is needed to specifically allow h2 framing to be used for non-HTTP semantics? Yeah I know some will say "you can't force those users of invalid HTTP fields to do something special". So instead we make all users of valid HTTP fields need to do extra validation else risk being insecure? Anyway, sorry for re-re-re-re-litigating. My live-with criteria is that an implementation MAY treat invalid HTTP fields as malformed, so in that sense this PR is fine.... it just doesn't fix #902 (and Martin says as much), so this is really just an editorial PR unrelated to #902. |
Greg, I'm not suggesting that an H2 field is safe, but "safe to be parsed as HTTP". The main reason is that some basic HTTP components have been broken into pieces in H2 and need to be reassembled before being processed by HTTP. :scheme, :authority, :path are two such examples. In HTTP together they're called a URI. Cookie is another one. Some H2-specific issues may happen at the H2 layer that are impossible by definition in HTTP since HTTP/1. In my opinion this is the trouble that is being attempted to be addressed here. And yes it turns out that it diverges a bit from #902 though it can address it if it mentions that consumed fields are subject to these checks. For sure, if we enforce strict validation of all H2 fields against HTTP ones it will address everything, except that we've already seen that implementors are not keen on switching between many specs at once and need some guidance. It's a delicate balance. Julian proposed to duplicate the ABNF there, I thought it was a good idea. And if we don't want to make it look normative we could write the minimum elements as a reminder. |
But the spec currently as currently written and after this PR is not "safe to be parsed as HTTP". If that parsing is done by a scripting language then DQUOTE and other control characters that are current permitted by the spec may well not be safe. To take your example of a cookie header, how can you be sure that code which is currently safely the cookie header value (or set-cookie value) will be robust in the face of control characters within that value? Isn't the set of h2 fields that are "safe to be parsed as HTTP" a true subset of valid HTTP fields? What extra characters can we allow into fields that are not valid HTTP that we are sure will be parsed safely as valid HTTP by all existing layers currently running on top of H2 implementations? The intent of these changes have been to normalize the practise in the wild of h2 framing being used to carry control characters in fields for non-HTTP purposes. But if we nomarlize that, we are putting in jeopardy all existing code written to HTTP semantics that may not even be aware of the version of HTTP carrying the fields that they are interpretting. Surely there is some other way to allow h2 framing to be used to carry non-HTTP streams without putting all existing HTTP code at risk? If we don't want developers abusing the current protocol, then give them a proper solution for their use-case rather than corrupt the existing primary use-case. |
They should represent the exact same level of risk as having the same code parse HTTP/1.
Same as above.
This is always difficult to say if we'd try to extend what HTTP permits. However we're certain that some will almost always be dangerous along a chain or cause parsing issues even inside implementations.
Not exactly. This is actually the current situation probably because some consider that the effort needed to comply with the rules in semantics is too high for little value the this level. This is something I can understand. For example in haproxy we have a semantics layer which performs a lot of controls, deduplicates content-length, checks Host against Thus maybe we should replace all this with an approach centered around this tricky HTTP/1 translation, which also does not infer how implementations should work internally nor suggest to put anything non-HTTP inside HEADERS frames:
Just to be clear, I really do not want to see HEADERS frame being abused to carry non-HTTP because I know pretty well what it's like to be an intermediary developer whose product is pointed the finger at for breaking stuff when inserted in a working but non-compliant chain. This is also why I would like to see the SHOULD reject as malformed. |
I like this paragraph and I'm totally OK with a spec that has conditions like "implementations which reassemble elements as an HTTP/1-like message before ...." However, it is not just HTTP/1 as there are many HTTP semantic layers that are unaware of the protocol version, so it needs to be any impl that represents streams as HTTP messages must do specific validation whilst the h2 "abusers" are free use streams to the limites of HPACK capabilities, so long as they don't represent the results as HTTP messages - either on the wire or to layers above. Furthermore an impl that is representing HTTP messages cannot just do the minimal safety checks you listed. It MUST validate the fields against the general syntax of HTTP fields. For example, if I provide a h2 implementation of the Java HttpExchange class, then my impl MUST validate the fields against HTTP syntax. But if instead, I use my h2 impl to implement a gRPC API, then I'm under no such obligation as I'm not representing HTTP messages. More over, it is only the protocol impl that can do this validation. Another example is that my server is used on PAAS platforms where the application was deployed decades ago and the source code is probably lost and the dev team all retired. The app cannot be redeployed and it certainly cannot be reviewed for how it parses the Cookie header. It's not sufficient for me to switch them from HTTP/1 to H2 and in the process expose those applications to non-valid HTTP headers in HTTP messages I deliver to them. If the h2 impl is not going to validate the fields are valid HTTP then who is? So how about something like:
|
The thing is, stacks that support multiple versions will certainly not implement the full header syntax in each version because that is exactly what ought to be done at the semantics layer and why the HTTP spec was split like this as well. We really have messaging (H1,H2,H3) and semantics. The transport explains how to extract elements from a byte stream and how to encode them into a byte stream. The semantics details their syntax and consistency. NUL/CR/LF purely come from the H1 world but H1 was HTTP before H2 existed, so it could be argued that they are legacy limitations that probably affects most stacks. The cookie header splitting is purely H2, and once recombined it must parse correctly according to the semantics definition. But whether your cookie header field comes from H1/H2/H3, it will have to pass through the exact same validity checks. What is certain is that these HTTP syntax checks must be performed somewhere. But the messaging layer is not always exactly the best place for this. Typically an H1 implementation will not have code to check for embedded LF characters because it's a delimiter, while H2 needs to have explicit check against this. The rules on the resulting cookie header field format apply after Cookie header reassembly, not before. And this reassembly is specific to the messaging layer which defines how to serialize fields. Last, the generic rules are not sufficient to safely reassemble the pseudo-header fields. Maybe in the end we could combine our two parts, starting with yours to indicate what every HTTP implementation must do, and putting the focus on the extreme care that is required when translating to HTTP/1-like since we know that it's one of the most natural approaches (and even used in the examples in the spec). This could give roughly:
With this I think it that the intent is clear, the guidance as well, it is exhaustive and doesn't leave any doubt on what needs to be done. |
Oh, and of course:
:-) |
I think extreme care is need anytime a HTTP message is represented, not just when represented as a HTTP/1. If a H2 layer is delivering messages to a semantic layer that was developed against h1, then it will not be assuming HTTP valid charset in no matter if the message is passed as byte streams or lines or XML or JSON or Strings in classes. It seams wrong for one particular messaging layer (h2) to require changes in transport impartial HTTP semantic layers so the messaging layer can be "abused" for non HTTP purposes? As you say, the actual syntax validations needed to be done are dependent on the messaging layer. So they are best done in the messaging layer as it knows what it is and can sometimes do things efficiently like checking charsets during parsing. If the syntax is not validated by the messaging layer, then the semantic layer will have to do full validation, which will result in wasting CPU duplicating any checks that are done by the messaging layer (or are intrinsic to the transport). |
I think I'm OK with this. There's just one minor thing:
It's actually "LF and COLON", but to be exhaustive I'd write "CR, LF, and COLON" since we've always kept extreme care on CR due to older H1 implementations. I think we really want to say something about the pseudo headers. They do not exist in HTTP/1 and the recent portswigger report clearly shows that many of us were relying on the tests performed on the reassembled start line, but that was too late (in my case everything was already NUL/CR/LF clean but LWS were not imagined there). I think it would be appropriate to place the small enumeration I mentioned above somewhere (probably after the "minimal validation" part). But it's possible that it doesn't have its place in the headers section and that it would be better discussed in the security considerations. And maybe then the whole "minimal validation" part can be moved there with it, since after all, it's what all this is about, covering risks that are known for having already been abused. If you're interested I can propose a whole paragraph about all this, as being an implementer makes it easy enough for me to describe the traps. I'm just still having difficulties with all the toolchain and the XML docs which is why I don't send PRs. |
(I'll add LF) As for pseudo-headers, welcome back scope creep. Would a simple prohibition on SP (and maybe HTAB) for pseudo-headers work? I'm fairly confident that the |
I've initially thought about it when fixing my bugs ("if I had filtered LWS as well..."). But that's not enough for :scheme nor :path. If :path starts with any character that is valid in a domain name, or a colon, you can change the authority when concatenating values. E.g.
or
Would result in `https://example.org.example.com/index.html" or "https://example.org:4443/index.html". Similarly if ":scheme" contains a colon, we're lost. In haproxy we've seriously hardened this part so that we don't have to deal with a variant of it any time soon. But I think that the absolute bare minimum is:
I don't see how to fool a start line parser when reassembling something based on such minimum controls, but I do see how to fool them by relaxing any single one of the rules above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good other than a couple of niggles
Expressing it this way makes it really clear the difference between recommended full HTTP validation and the minimal validation that must be imposed if full validation is not done.
A field name MUST NOT contain characters in the ranges 0x00-0x20, 0x41-0x5A, or | ||
0x7F-0xFF (all ranges inclusive). This specifically excludes all non-visible ASCII | ||
characters, ASCII SP (0x20), and uppercase characters ('A' to 'Z', ASCII 0x41 to | ||
0x5a). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should DQUOTE and single quote also be listed here to protect scripting languages?
The references to HTTP were busted. I fixed that. More importantly, these extra checks are not optional, as we require that uppercase names be rejected. I've made them directly mandatory, but noted that only the uppercase name check is needed if HTTP validation is performed.
Editors, it looks like this is ready to ship; anything stopping that? |
Only that discussion was continuing. But it seems to have settled now. |
Thanks Martin. What about adding the point I made above in #936 (comment) for :scheme, :path and :method which need further protection (that's not addressed in core since purely H2) ? I could propose this text (yeah I know my formulations are not always great):
|
This just moves the reference to the canonical definition of what is
valid up to the top of the section. That's the canonical text.
This emphasizes the point that HTTP/2 is placing additional
requirements on endpoints with respect to validation.
This does not really fix #902 in the sense that it leaves validation of
DQUOTE and friends to the core semantics implementations.
Note that any "MAY" requirement for rejecting a messaging can have the
desired effect on those generating those messages: if an endpoint puts
DQUOTE in a field name, that message will have little hope of being
successfully handled.
Closes #902.