From 18dcc3ccad482e4d82ff51c11562dfcb8c085213 Mon Sep 17 00:00:00 2001 From: Andreu Botella Date: Wed, 13 Jan 2021 21:10:10 +0100 Subject: [PATCH 1/6] Add a final newline normalization for form payloads When entries are added to a form's entry list through the "append an entry" algorithm, their newlines are normalized, but entries can be added to an entry list through other means. This change adds a final newline normalization before serializing the form payload, since "append an entry" cannot be changed because its results are observable through the `FormData` object or through the `formdata` event. This change additionally changes the input passed to the `application/x-www-form-urlencoded` and `text/plain` serializers to be a list of name-value pairs, where the values are strings rather than `File` objects. This simplifies the serializer algorithms. Closes #6247. Closes whatwg/url#562. --- source | 113 ++++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 95 insertions(+), 18 deletions(-) diff --git a/source b/source index accbd2780ff..05d55b0dcd2 100644 --- a/source +++ b/source @@ -56069,9 +56069,12 @@ fur
Mutate action URL
+

Let pairs be the result of converting to a list of name-value pairs with entry list.

+

Let query be the result of running the - application/x-www-form-urlencoded serializer with entry - list and encoding.

+ application/x-www-form-urlencoded serializer with pairs + and encoding.

Set parsed action's query component to query.

@@ -56087,9 +56090,12 @@ fur
application/x-www-form-urlencoded
+

Let pairs be the result of converting to a list of name-value pairs with entry list.

+

Let body be the result of running the - application/x-www-form-urlencoded serializer with entry - list and encoding.

+ application/x-www-form-urlencoded serializer with pairs + and encoding.

Set body to the result of encoding body.

@@ -56100,6 +56106,24 @@ fur
multipart/form-data
+

For each entry in entry list:

+ +
    +
  1. Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every + occurrence of U+000A (LF) not preceded by U+000D (CR), in the entry's name, by a string + consisting of a U+000D (CR) and U+000A (LF).

  2. +
  3. If the entry's value is not a File object, replace every occurrence of + U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded + by U+000D (CR), in the entry's value, by a string consisting of a U+000D (CR) and U+000A + (LF).

  4. +
+ +

These newline conversions in this algorithm are necessary because not all + names and string values in entry lists reaching this point need have been previously + normalized when appending the entry. That + normalization is idempotent, so implementations are allowed to keep track of which names and + values have been previously normalized in order to skip them in this algorithm.

+

Let body be the result of running the multipart/form-data encoding algorithm with entry list and encoding.

@@ -56114,8 +56138,11 @@ fur
text/plain
+

Let pairs be the result of converting to a list of name-value pairs with entry list.

+

Let body be the result of running the text/plain - encoding algorithm with entry list.

+ encoding algorithm with pairs.

Set body to the result of encoding body using encoding.

@@ -56141,9 +56168,12 @@ fur
Mail with headers
+

Let pairs be the result of converting to a list of name-value pairs with entry list.

+

Let headers be the result of running the - application/x-www-form-urlencoded serializer with entry - list and encoding.

+ application/x-www-form-urlencoded serializer with pairs + and encoding.

Replace occurrences of U+002B PLUS SIGN characters (+) in headers with the string "%20".

@@ -56156,6 +56186,9 @@ fur
Mail as body
+

Let pairs be the result of converting to a list of name-value pairs with entry list.

+

Switch on enctype:

@@ -56163,7 +56196,7 @@ fur

Let body be the result of running the text/plain - encoding algorithm with entry list.

+ encoding algorithm with pairs.

Set body to the result of running UTF-8 percent-encode on body using the default encode set.

@@ -56172,8 +56205,8 @@ fur
Otherwise

Let body be the result of running the - application/x-www-form-urlencoded serializer with entry - list and encoding.

+ application/x-www-form-urlencoded serializer with pairs + and encoding.

If parsed action's query is null, then @@ -56514,6 +56547,53 @@ fur +

+ +
Converting an entry list to a list of name-value pairs
+ +

The application/x-www-form-urlencoded and text/plain encoding algorithms take a list of name-value pairs, where the values + must be strings, rather than an entry list where the value can be a File. The + following algorithm performs the conversion.

+ +

To convert to a list of name-value pairs an entry list entry list, run + these steps:

+ +
    +
  1. Let list be an empty list of name-value pairs.

  2. + +
  3. +

    For each entry in entry list:

    + +
      +
    1. Let name be the entry's name, with every occurrence of U+000D (CR) not + followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded by U+000D (CR), + replaced by a string consisting of U+000D (CR) and U+000A (LF).

    2. + +
    3. If the entry's value is a File, then let value be the entry's + value's name. Otherwise, let value be the + entry's value, with every occurrence of U+000D (CR) not followed by U+000A (LF), and every + occurrence of U+000A (LF) not preceded by U+000D (CR), replaced by a string consisting of + U+000D (CR) and U+000A (LF).

    4. + +
    5. Append to list a new name-value pair whose + name is name and whose value is value.

    6. +
    +
  4. + +
  5. Return list.

  6. +
+ +

The newline conversions in this algorithm are necessary because not all names and + string values reaching the application/x-www-form-urlencoded or text/plain serializers need have been previously + normalized when appending the entry, and in fact no + filenames have. That normalization is idempotent, so implementations are allowed to keep track of + which names and string values have been previously normalized in order to skip them in this + algorithm.

+ +
+
URL-encoded form data
@@ -56576,24 +56656,21 @@ fur
-

The text/plain encoding algorithm, given an entry - list, is as follows:

+

The text/plain encoding algorithm, given a list of name-value + pairs pairs, is as follows:

  1. Let result be the empty string.

  2. -

    For each entry in entry list:

    +

    For each pair in pairs:

      -
    1. If the entry's value is a File object, then set its value to the - File object's name.

    2. - -
    3. Append the entry's name to result.

    4. +
    5. Append pair's name to result.

    6. Append a single U+003D EQUALS SIGN character (=) to result.

    7. -
    8. Append the entry's value to result.

    9. +
    10. Append pair's value to result.

    11. Append a U+000D CARRIAGE RETURN (CR) U+000A LINE FEED (LF) character pair to result.

    From 98abf83ea563e1e9299957ea32b1bd07b9114ec2 Mon Sep 17 00:00:00 2001 From: Andreu Botella Date: Mon, 18 Jan 2021 13:16:54 +0100 Subject: [PATCH 2/6] Move the multipart/form-data normalizations to the multipart/form-data algorithm section. --- source | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/source b/source index 71905cf6f72..083ba50eaec 100644 --- a/source +++ b/source @@ -56098,24 +56098,6 @@ fur
    multipart/form-data
    -

    For each entry in entry list:

    - -
      -
    1. Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every - occurrence of U+000A (LF) not preceded by U+000D (CR), in the entry's name, by a string - consisting of a U+000D (CR) and U+000A (LF).

    2. -
    3. If the entry's value is not a File object, replace every occurrence of - U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded - by U+000D (CR), in the entry's value, by a string consisting of a U+000D (CR) and U+000A - (LF).

    4. -
    - -

    These newline conversions in this algorithm are necessary because not all - names and string values in entry lists reaching this point need have been previously - normalized when appending the entry. That - normalization is idempotent, so implementations are allowed to keep track of which names and - values have been previously normalized in order to skip them in this algorithm.

    -

    Let body be the result of running the multipart/form-data encoding algorithm with entry list and encoding.

    @@ -56598,6 +56580,26 @@ fur list and encoding, is as follows:

      +
    1. +

      For each entry in entry list:

      + +
        +
      1. Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every + occurrence of U+000A (LF) not preceded by U+000D (CR), in the entry's name, by a string + consisting of a U+000D (CR) and U+000A (LF).

      2. +
      3. If the entry's value is not a File object, replace every occurrence of + U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded + by U+000D (CR), in the entry's value, by a string consisting of a U+000D (CR) and U+000A + (LF).

      4. +
      + +

      These newline conversions in this algorithm are necessary because not all + names and string values in entry lists reaching this point need have been previously + normalized when appending the entry. That + normalization is idempotent, so implementations are allowed to keep track of which names and + values have been previously normalized in order to skip them in this algorithm.

      +
    2. +
    3. Return the byte sequence resulting from encoding the entry list using the rules described by RFC 7578, Returning Values from Forms: Date: Mon, 18 Jan 2021 13:21:45 +0100 Subject: [PATCH 3/6] Reword the notes about double normalization. --- source | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/source b/source index 083ba50eaec..19c97b69ea2 100644 --- a/source +++ b/source @@ -56549,12 +56549,12 @@ fur

    4. Return list.

    -

    The newline conversions in this algorithm are necessary because not all names and - string values reaching the application/x-www-form-urlencoded or text/plain serializers need have been previously - normalized when appending the entry, and in fact no - filenames have. That normalization is idempotent, so implementations are allowed to keep track of - which names and string values have been previously normalized in order to skip them in this +

    The newline conversions in this algorithm are necessary because some of the names + and string values reaching the application/x-www-form-urlencoded or text/plain serializers might not have been + previously normalized in the appending the entry algorithm, + and in fact no filenames have. That normalization is idempotent, so implementations are allowed to + keep track of which entries have been previously normalized in order to skip them in this algorithm.

@@ -56593,11 +56593,11 @@ fur (LF).

-

These newline conversions in this algorithm are necessary because not all - names and string values in entry lists reaching this point need have been previously - normalized when appending the entry. That - normalization is idempotent, so implementations are allowed to keep track of which names and - values have been previously normalized in order to skip them in this algorithm.

+

These newline conversions in this algorithm are necessary because some of the + names and string values in entry lists reaching this encoding algorithm might not have been + previously normalized in the appending the entry + algorithm. That normalization is idempotent, so implementations are allowed to keep track of + which entries have been previously normalized in order to skip them in this algorithm.

  • From 99a4be8b02612e5d8497504331b50daf4badf560 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Mon, 26 Apr 2021 17:31:31 +0200 Subject: [PATCH 4/6] nits --- source | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/source b/source index 19c97b69ea2..a0aec6a7945 100644 --- a/source +++ b/source @@ -56528,18 +56528,18 @@ fur
  • Let list be an empty list of name-value pairs.

  • -

    For each entry in entry list:

    +

    For each entry of entry list:

      -
    1. Let name be the entry's name, with every occurrence of U+000D (CR) not +

    2. Let name be entry's name, with every occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded by U+000D (CR), replaced by a string consisting of U+000D (CR) and U+000A (LF).

    3. -
    4. If the entry's value is a File, then let value be the entry's - value's name. Otherwise, let value be the - entry's value, with every occurrence of U+000D (CR) not followed by U+000A (LF), and every - occurrence of U+000A (LF) not preceded by U+000D (CR), replaced by a string consisting of - U+000D (CR) and U+000A (LF).

    5. +
    6. If entry's value is a File object, then let value be + entry's value's name. Otherwise, let + value be entry's value, with every occurrence of U+000D (CR) not followed + by U+000A (LF), and every occurrence of U+000A (LF) not preceded by U+000D (CR), replaced by a + string consisting of U+000D (CR) and U+000A (LF).

    7. Append to list a new name-value pair whose name is name and whose value is value.

    8. @@ -56581,16 +56581,17 @@ fur
      1. -

        For each entry in entry list:

        +

        For each entry of entry list:

        1. Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every - occurrence of U+000A (LF) not preceded by U+000D (CR), in the entry's name, by a string + occurrence of U+000A (LF) not preceded by U+000D (CR), in entry's name, by a string consisting of a U+000D (CR) and U+000A (LF).

        2. -
        3. If the entry's value is not a File object, replace every occurrence of - U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded - by U+000D (CR), in the entry's value, by a string consisting of a U+000D (CR) and U+000A - (LF).

        4. + +
        5. If entry's value is not a File object, then replace every + occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not + preceded by U+000D (CR), in entry's value, by a string consisting of a U+000D (CR) + and U+000A (LF).

        These newline conversions in this algorithm are necessary because some of the From b96d053957e97957e504d04eeb40719863799710 Mon Sep 17 00:00:00 2001 From: Andreu Botella Date: Mon, 3 May 2021 10:26:58 +0200 Subject: [PATCH 5/6] Delete notes about double normalization. See https://github.com/whatwg/html/pull/6624#pullrequestreview-648240331 --- source | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/source b/source index a0aec6a7945..5e4a988e057 100644 --- a/source +++ b/source @@ -56549,14 +56549,6 @@ fur

      2. Return list.

      -

      The newline conversions in this algorithm are necessary because some of the names - and string values reaching the application/x-www-form-urlencoded or text/plain serializers might not have been - previously normalized in the appending the entry algorithm, - and in fact no filenames have. That normalization is idempotent, so implementations are allowed to - keep track of which entries have been previously normalized in order to skip them in this - algorithm.

      - @@ -56593,12 +56585,6 @@ fur preceded by U+000D (CR), in entry's value, by a string consisting of a U+000D (CR) and U+000A (LF).

    - -

    These newline conversions in this algorithm are necessary because some of the - names and string values in entry lists reaching this encoding algorithm might not have been - previously normalized in the appending the entry - algorithm. That normalization is idempotent, so implementations are allowed to keep track of - which entries have been previously normalized in order to skip them in this algorithm.

  • From 33c99316dd5cd6bba91ee4102ae1b0df01ccebd9 Mon Sep 17 00:00:00 2001 From: Andreu Botella Date: Fri, 7 May 2021 09:29:38 +0200 Subject: [PATCH 6/6] Don't forget to normalize filenames in urlencoded and text/plain. --- source | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/source b/source index 5e4a988e057..08d96b29a74 100644 --- a/source +++ b/source @@ -56537,9 +56537,11 @@ fur
  • If entry's value is a File object, then let value be entry's value's name. Otherwise, let - value be entry's value, with every occurrence of U+000D (CR) not followed - by U+000A (LF), and every occurrence of U+000A (LF) not preceded by U+000D (CR), replaced by a - string consisting of U+000D (CR) and U+000A (LF).

  • + value be entry's value.

    + +
  • Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence + of U+000A (LF) not preceded by U+000D (CR), in value, by a string consisting of + U+000D (CR) and U+000A (LF).

  • Append to list a new name-value pair whose name is name and whose value is value.