-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All percent encoding #2594
All percent encoding #2594
Conversation
Fixes #2575.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Let byte_array be the result of applying UTF-8 encoding {{UTF8}} to input_string. If there is an error in doing so, fail parsing.
How can that fail?
draft-ietf-httpbis-sfbis.md
Outdated
@@ -695,8 +695,10 @@ Given a string of Unicode characters as input_string, return an ASCII string sui | |||
1. If byte is %x25 ("%"), append "%25" to encoded_string. | |||
2. If byte is in the ranges %x00-1f or %x7f-ff, apply the percent-encoding defined in {{Section 2.1 of URI}} to byte and append the result to encoded_string. | |||
3. Otherwise, decode byte as an ASCII character and append the result to encoded_string. | |||
3. Let formatted_string be the result of running Serialising a String ({{ser-string}}) with encoded_string. | |||
4. Return the character "%" followed by formatted_string. | |||
3. Let output be a string containing %x25 ("%") followed by DQUOTE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the definition from the URI spec, shouldn't we mandate either upper or lowercase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should parsing fail if that case isn't seen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whichever we specify...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe talk about different things? Parsing percent-decoded as per URI spec can fail, and we need to say something about that, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you were asking if we could mandate a particular casing here (in the serialisation algorithm); I was asking if that casing should be enforced by the parsing algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Yes, it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned that some producers might not be able to control the casing of their output. While they could run it through an additional step of processing, that would increase overhead for them.
Do we expect anyone to be doing something (eg security scanning) that would be expecting a particular casing, and wouldn't be able to adapt to two possible casings? My initial reaction is that I'd rather not increasing efficiency for those parties by decreasing efficiency for others...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly a matter of consistency with the remainder of the spec. For values, there's exactly one value to serialize them (or maybe that's not the case because of Decimal???).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't require sytems to error when parsing eg 0001
. Byte sequences don't require parsers to fail on padding irregularities. I appreciate that we specify error handling very carefully in this document, but we do so when there's a point to it, not merely for consistency.
Looks good except for minor details and potentially a bug (encoding vs decoding). I'd also like to see the spec enforce a consistent form of percent encoding (lower vs upper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WFM, though I think I prefer to go all uppercase (because then we can use RFC 4648 instead) rather than be case-variation-tolerant.
Co-authored-by: Martin Thomson <mt@lowentropy.net>
I am sceptical wrt "use the URI percent encoder". The reason being:
|
I've removed the reliance on URI for percent encoding; was clean as @martinthomson said it would be. Regarding case -- @martinthomson RFC4648 Section 8 is explicitly case-insensitive. Given that and the discussion in the group yesterday, I've left this case-insensitive. Say if you still disagree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that relying on a different RFC for the hex encoding makes this much better (so I sort of disagree with Martin's proposal).
Just define the escaping exactly inline, and then we can choose upper/lower.
draft-ietf-httpbis-sfbis.md
Outdated
@@ -61,7 +61,7 @@ informative: | |||
RFC9113: | |||
display: HTTP/2 | |||
HPACK: RFC7541 | |||
URI: RFC3986 | |||
ENCODING: RFC4648 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using "ENCODING" is a bit misleading. If we don't have a catchy name for the spec, I'd suggest just saying RFC4648.
@reschke , I was suggesting an inline definition. The fact that uppercase is base16 would just be an observation (and not a particularly useful one). Lowercase tends to be the default mode in several places. |
@martinthomson it wasn't at all clear that's what you were suggesting. I'll make an attempt. |
Looking at it, I think I prefer referring to 4648. This this addresses the original issue, I'll merge; if someone wants to propose an alternate way to specify this, please feel free to open a PR. |
Fixes #2575.