Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All percent encoding #2594

Merged
merged 8 commits into from
Jul 31, 2023
Merged

All percent encoding #2594

merged 8 commits into from
Jul 31, 2023

Conversation

mnot
Copy link
Member

@mnot mnot commented Jul 11, 2023

Fixes #2575.

@mnot mnot requested a review from reschke July 11, 2023 03:37
@mnot mnot requested a review from bsdphk as a code owner July 11, 2023 03:37
Copy link
Contributor

@reschke reschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Let byte_array be the result of applying UTF-8 encoding {{UTF8}} to input_string. If there is an error in doing so, fail parsing.

How can that fail?

@@ -695,8 +695,10 @@ Given a string of Unicode characters as input_string, return an ASCII string sui
1. If byte is %x25 ("%"), append "%25" to encoded_string.
2. If byte is in the ranges %x00-1f or %x7f-ff, apply the percent-encoding defined in {{Section 2.1 of URI}} to byte and append the result to encoded_string.
3. Otherwise, decode byte as an ASCII character and append the result to encoded_string.
3. Let formatted_string be the result of running Serialising a String ({{ser-string}}) with encoded_string.
4. Return the character "%" followed by formatted_string.
3. Let output be a string containing %x25 ("%") followed by DQUOTE.
Copy link
Contributor

@reschke reschke Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the definition from the URI spec, shouldn't we mandate either upper or lowercase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should parsing fail if that case isn't seen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whichever we specify...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe talk about different things? Parsing percent-decoded as per URI spec can fail, and we need to say something about that, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you were asking if we could mandate a particular casing here (in the serialisation algorithm); I was asking if that casing should be enforced by the parsing algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Yes, it should.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that some producers might not be able to control the casing of their output. While they could run it through an additional step of processing, that would increase overhead for them.

Do we expect anyone to be doing something (eg security scanning) that would be expecting a particular casing, and wouldn't be able to adapt to two possible casings? My initial reaction is that I'd rather not increasing efficiency for those parties by decreasing efficiency for others...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly a matter of consistency with the remainder of the spec. For values, there's exactly one value to serialize them (or maybe that's not the case because of Decimal???).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't require sytems to error when parsing eg 0001. Byte sequences don't require parsers to fail on padding irregularities. I appreciate that we specify error handling very carefully in this document, but we do so when there's a point to it, not merely for consistency.

@reschke
Copy link
Contributor

reschke commented Jul 11, 2023

Looks good except for minor details and potentially a bug (encoding vs decoding).

I'd also like to see the spec enforce a consistent form of percent encoding (lower vs upper)

Copy link
Contributor

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WFM, though I think I prefer to go all uppercase (because then we can use RFC 4648 instead) rather than be case-variation-tolerant.

draft-ietf-httpbis-sfbis.md Outdated Show resolved Hide resolved
draft-ietf-httpbis-sfbis.md Outdated Show resolved Hide resolved
draft-ietf-httpbis-sfbis.md Outdated Show resolved Hide resolved
Co-authored-by: Martin Thomson <mt@lowentropy.net>
@reschke
Copy link
Contributor

reschke commented Jul 27, 2023

I am sceptical wrt "use the URI percent encoder". The reason being:

  • percent encoding rules depend on what part of the URI needs encoding (and many APIs do not get that)
  • calling the API for a single character multiple times usually requires converting a char to a string, and that might not be good for perf if we really care about that

@mnot
Copy link
Member Author

mnot commented Jul 28, 2023

I've removed the reliance on URI for percent encoding; was clean as @martinthomson said it would be.

Regarding case -- @martinthomson RFC4648 Section 8 is explicitly case-insensitive. Given that and the discussion in the group yesterday, I've left this case-insensitive. Say if you still disagree.

Copy link
Contributor

@reschke reschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that relying on a different RFC for the hex encoding makes this much better (so I sort of disagree with Martin's proposal).

Just define the escaping exactly inline, and then we can choose upper/lower.

@@ -61,7 +61,7 @@ informative:
RFC9113:
display: HTTP/2
HPACK: RFC7541
URI: RFC3986
ENCODING: RFC4648
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "ENCODING" is a bit misleading. If we don't have a catchy name for the spec, I'd suggest just saying RFC4648.

@martinthomson
Copy link
Contributor

@reschke , I was suggesting an inline definition. The fact that uppercase is base16 would just be an observation (and not a particularly useful one). Lowercase tends to be the default mode in several places.

@mnot
Copy link
Member Author

mnot commented Jul 31, 2023

@martinthomson it wasn't at all clear that's what you were suggesting. I'll make an attempt.

@mnot
Copy link
Member Author

mnot commented Jul 31, 2023

Looking at it, I think I prefer referring to 4648. This this addresses the original issue, I'll merge; if someone wants to propose an alternate way to specify this, please feel free to open a PR.

@mnot mnot merged commit 93bc533 into main Jul 31, 2023
2 checks passed
@mnot mnot deleted the mnot/2575 branch July 31, 2023 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

DisplayString pick-your-own-escaping is an invitation for smuggling attacks
3 participants