All percent encoding #2594

mnot · 2023-07-11T03:37:11Z

reschke

Let byte_array be the result of applying UTF-8 encoding {{UTF8}} to input_string. If there is an error in doing so, fail parsing.

How can that fail?

reschke · 2023-07-11T11:45:57Z

draft-ietf-httpbis-sfbis.md

@@ -695,8 +695,10 @@ Given a string of Unicode characters as input_string, return an ASCII string sui
 1. If byte is %x25 ("%"), append "%25" to encoded_string.
 2. If byte is in the ranges %x00-1f or %x7f-ff, apply the percent-encoding defined in {{Section 2.1 of URI}} to byte and append the result to encoded_string.
 3. Otherwise, decode byte as an ASCII character and append the result to encoded_string.
-3. Let formatted_string be the result of running Serialising a String ({{ser-string}}) with encoded_string.
-4. Return the character "%" followed by formatted_string.
+3. Let output be a string containing %x25 ("%") followed by DQUOTE.


If we use the definition from the URI spec, shouldn't we mandate either upper or lowercase?

Should parsing fail if that case isn't seen?

which case?

whichever we specify...

Maybe talk about different things? Parsing percent-decoded as per URI spec can fail, and we need to say something about that, no?

I thought you were asking if we could mandate a particular casing here (in the serialisation algorithm); I was asking if that casing should be enforced by the parsing algorithm.

Ah. Yes, it should.

I'm a bit concerned that some producers might not be able to control the casing of their output. While they could run it through an additional step of processing, that would increase overhead for them.

Do we expect anyone to be doing something (eg security scanning) that would be expecting a particular casing, and wouldn't be able to adapt to two possible casings? My initial reaction is that I'd rather not increasing efficiency for those parties by decreasing efficiency for others...

It's mostly a matter of consistency with the remainder of the spec. For values, there's exactly one value to serialize them (or maybe that's not the case because of Decimal???).

We don't require sytems to error when parsing eg 0001. Byte sequences don't require parsers to fail on padding irregularities. I appreciate that we specify error handling very carefully in this document, but we do so when there's a point to it, not merely for consistency.

draft-ietf-httpbis-sfbis.md

reschke · 2023-07-11T11:52:25Z

Looks good except for minor details and potentially a bug (encoding vs decoding).

I'd also like to see the spec enforce a consistent form of percent encoding (lower vs upper)

martinthomson

WFM, though I think I prefer to go all uppercase (because then we can use RFC 4648 instead) rather than be case-variation-tolerant.

draft-ietf-httpbis-sfbis.md

Co-authored-by: Martin Thomson <mt@lowentropy.net>

reschke · 2023-07-27T08:26:26Z

I am sceptical wrt "use the URI percent encoder". The reason being:

percent encoding rules depend on what part of the URI needs encoding (and many APIs do not get that)
calling the API for a single character multiple times usually requires converting a char to a string, and that might not be good for perf if we really care about that

mnot · 2023-07-28T21:16:00Z

I've removed the reliance on URI for percent encoding; was clean as @martinthomson said it would be.

Regarding case -- @martinthomson RFC4648 Section 8 is explicitly case-insensitive. Given that and the discussion in the group yesterday, I've left this case-insensitive. Say if you still disagree.

reschke

I don't believe that relying on a different RFC for the hex encoding makes this much better (so I sort of disagree with Martin's proposal).

Just define the escaping exactly inline, and then we can choose upper/lower.

reschke · 2023-07-29T08:23:34Z

draft-ietf-httpbis-sfbis.md

@@ -61,7 +61,7 @@ informative:
 RFC9113:
 display: HTTP/2
 HPACK: RFC7541
- URI: RFC3986
+ ENCODING: RFC4648


Using "ENCODING" is a bit misleading. If we don't have a catchy name for the spec, I'd suggest just saying RFC4648.

martinthomson · 2023-07-30T00:51:48Z

@reschke , I was suggesting an inline definition. The fact that uppercase is base16 would just be an observation (and not a particularly useful one). Lowercase tends to be the default mode in several places.

mnot · 2023-07-31T02:28:41Z

@martinthomson it wasn't at all clear that's what you were suggesting. I'll make an attempt.

mnot · 2023-07-31T05:57:07Z

Looking at it, I think I prefer referring to 4648. This this addresses the original issue, I'll merge; if someone wants to propose an alternate way to specify this, please feel free to open a PR.

All percent encoding

1b74c4e

Fixes #2575.

mnot requested a review from reschke July 11, 2023 03:37

mnot requested a review from bsdphk as a code owner July 11, 2023 03:37

reschke reviewed Jul 11, 2023

View reviewed changes

draft-ietf-httpbis-sfbis.md Outdated Show resolved Hide resolved

reschke reviewed Jul 11, 2023

View reviewed changes

draft-ietf-httpbis-sfbis.md Outdated Show resolved Hide resolved

reschke reviewed Jul 11, 2023

View reviewed changes

draft-ietf-httpbis-sfbis.md Show resolved Hide resolved

mnot added 4 commits July 12, 2023 10:44

check for unicode string before encoding.

e5df09f

don't need to used escaped sequence here.

6680236

Add failure condition.

a6e38f6

encode quotes

5ffb6d0

martinthomson approved these changes Jul 17, 2023

View reviewed changes

draft-ietf-httpbis-sfbis.md Outdated Show resolved Hide resolved

draft-ietf-httpbis-sfbis.md Outdated Show resolved Hide resolved

draft-ietf-httpbis-sfbis.md Outdated Show resolved Hide resolved

Update draft-ietf-httpbis-sfbis.md

459f6e2

Co-authored-by: Martin Thomson <mt@lowentropy.net>

Don't rely on URI for hex encoding

fcdc58a

mnot requested review from reschke and martinthomson July 28, 2023 21:16

reschke reviewed Jul 29, 2023

View reviewed changes

ENCODING -> RFC4648

3e2b481

mnot merged commit 93bc533 into main Jul 31, 2023
2 checks passed

mnot deleted the mnot/2575 branch July 31, 2023 05:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All percent encoding #2594

All percent encoding #2594

mnot commented Jul 11, 2023

reschke left a comment

reschke Jul 11, 2023 •

edited

Loading

mnot Jul 12, 2023

reschke Jul 12, 2023

mnot Jul 13, 2023

reschke Jul 13, 2023

mnot Jul 13, 2023

reschke Jul 13, 2023

mnot Jul 13, 2023

reschke Jul 13, 2023

mnot Jul 14, 2023

reschke commented Jul 11, 2023

martinthomson left a comment

reschke commented Jul 27, 2023

mnot commented Jul 28, 2023

reschke left a comment

reschke Jul 29, 2023

martinthomson commented Jul 30, 2023

mnot commented Jul 31, 2023

mnot commented Jul 31, 2023

All percent encoding #2594

All percent encoding #2594

Conversation

mnot commented Jul 11, 2023

reschke left a comment

Choose a reason for hiding this comment

reschke Jul 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reschke commented Jul 11, 2023

martinthomson left a comment

Choose a reason for hiding this comment

reschke commented Jul 27, 2023

mnot commented Jul 28, 2023

reschke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinthomson commented Jul 30, 2023

mnot commented Jul 31, 2023

mnot commented Jul 31, 2023

reschke Jul 11, 2023 •

edited

Loading