Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a final newline normalization for form payloads #6287

Merged
merged 7 commits into from
May 20, 2021

Conversation

andreubotella
Copy link
Member

@andreubotella andreubotella commented Jan 13, 2021

When entries are added to a form's entry list through the "append an entry" algorithm, their newlines are normalized, but entries can be added to an entry list through other means. This change adds a final newline normalization before serializing the form payload, since "append an entry" cannot be changed because its results are observable through the FormData object or through the formdata event.

This change additionally changes the input passed to the application/x-www-form-urlencoded and text/plain serializers to be a list of name-value pairs, where the values are strings rather than File objects. This simplifies the serializer algorithms.

Closes #6247. Closes whatwg/url#562.

(See WHATWG Working Mode: Changes for more details.)


/form-control-infrastructure.html ( diff )
/index.html ( diff )

When entries are added to a form's entry list through the "append an
entry" algorithm, their newlines are normalized, but entries can be
added to an entry list through other means. This change adds a final
newline normalization before serializing the form payload, since "append
an entry" cannot be changed because its results are observable through
the `FormData` object or through the `formdata` event.

This change additionally changes the input passed to the
`application/x-www-form-urlencoded` and `text/plain` serializers to be a
list of name-value pairs, where the values are strings rather than
`File` objects. This simplifies the serializer algorithms.

Closes whatwg#6247. Closes whatwg/url#562.
@domenic
Copy link
Member

domenic commented Jan 13, 2021

This looks solid editorially, but I haven't been following a lot of the discussion. Tests, which help generate a list of ways browsers deviate from the tests, seem like a good next step.

I'll also note that we would probably benefit from factoring out a "normalize newlines to CRLF" operation, either in HTML (if we consider that sort of thing kinda legacy) or Infra (if we anticipate it being used more widely). I'm pretty sure that beyond just this section, there are other parts of HTML that would use it. But that could be done as a followup.

@andreubotella
Copy link
Member Author

andreubotella commented Jan 14, 2021

This looks solid editorially, but I haven't been following a lot of the discussion. Tests, which help generate a list of ways browsers deviate from the tests, seem like a good next step.

I've now linked to web-platform-tests/wpt#26740, which used to contain the tests for whatwg/url#562 before it grew in scope. Now I've added a lot of tests that should cover this change.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was thinking that we should check for the submission type when constructing an entry list and flatten things there inline. I guess that is not possible because of the formdata event? It would be a lot nicer if we can find an algorithm that does not have as much duplication.

cc @whatwg/forms

source Show resolved Hide resolved
@andreubotella
Copy link
Member Author

andreubotella commented Jan 14, 2021

So I was thinking that we should check for the submission type when constructing an entry list and flatten things there inline. I guess that is not possible because of the formdata event? It would be a lot nicer if we can find an algorithm that does not have as much duplication.

You could do the normalization right after step 7 in constructing the entry list so it wouldn't affect the formdata event, but it would still affect the FormData constructor:

form.addEventListener("formdata", evt => {
    evt.formData.append("a", "b\nc");
});
const formData = new FormData(form);
assert_equals(formData.get("a"), "b\nc");  // in all browsers (except for Webkit, that doesn't support the formdata event)

You could do the normalization at some point after constructing the entry list in the form submission algorithm, but I chose to do it in the different behaviors in step 23 because enctype isn't the only piece of data that determines the serialization type. But we could test for enctype == "multipart/form-data" and an http(s) URL.

Base automatically changed from master to main January 15, 2021 07:58
@andreubotella
Copy link
Member Author

I just realized that Request and Response performing newline normalization on a FormData body (as Safari does and as tested by /FileAPI/files/send-file-formdata-*) means that the normalization must happen in the multipart/form-data encoding algorithm, not in the form submission algorithm.

@annevk
Copy link
Member

annevk commented Jan 19, 2021

If only Safari does it that's not necessarily what we want to do though, right? It makes some sense to do newline normalization for user input (i.e., through <form>), but I'm less convinced we should do it for programmatic input. The more we can just leave that as-is the better I think. Having said that, I don't feel too strongly. Would be great to get some input from implementers.

@andreubotella
Copy link
Member Author

andreubotella commented Jan 19, 2021

So, for implementer feedback, this is the current state of things across browsers, and the way this PR changes it. See #6247 for an explanation of the terms "String/File-sourced entries" and "FormData-sourced entries", but in short, the former are user inputs (and form-associated custom elements whose submission value isn't a list of entries), and the latter are programmatically generated.

  • application/x-www-form-urlencoded and text/plain serializations:
    • String/File-sourced entries:
      • Current spec: names and values which derive from strings are normalized
      • This PR, Chrome, Firefox, Safari: all names and values are normalized
    • FormData-sourced entries:
      • Current spec: unchanged
      • This PR, Chrome (on application/x-www-form-urlencoded), Firefox: all names and values are normalized
      • Chrome on text/plain: unchanged except for values which derive from an original filename, which are normalized
      • Safari: N/A
  • multipart/form-data serialization:
    • String/File-sourced entries:
      • Current spec, this PR, Chrome, Safari: names and string values are normalized, filenames are unchanged
      • Firefox: string values are normalized (names and filenames N/A)*
    • FormData-sourced entries:
      • Current spec, Chrome: unchanged
      • Firefox: string values are normalized (names and filenames N/A)*
      • This PR, Safari: names and string values are normalized, filenames are unchanged

*. Firefox turns newlines in multipart/form-data names and filenames into spaces. Per #6282 that's now non-conforming.

@annevk
Copy link
Member

annevk commented Jan 19, 2021

@mfreed7 @whatwg/forms @cdumez @tkent-google thoughts on the summary above? I still like the idea of trying not to touch FormData-sourced entries at all so developers have full control over the output there.

@andreubotella
Copy link
Member Author

andreubotella commented Jan 19, 2021

I still like the idea of trying not to touch FormData-sourced entries at all so developers have full control over the output there.

I just realized that this might not be simpler spec-wise for the cases of urlencoded and text/plain.

The distinction between String/File-sourced entries and FormData-sourced entries in the spec is that the former are added to the entry list through the "append an entry" algorithm, which normalizes the entry's name and string value (not filename). By the spec and Chrome (not Firefox or Safari), this normalization is visible when creating a FormData object from a form, or when observing the formdata event, and this PR doesn't change that. What it does is adding a new normalization on top before serializing the payloads, which is fine because newline normalization is idempotent.

But I opened whatwg/url#562 because (for String/File-sourced entries), the spec as it is now requires that filenames don't end up normalized in urlencoded and text/plain payloads, which goes against all implementations. We can't patch "append an entry" to either normalize the filename or turn the File value into a string value with the normalized filename, because that's observable from the formdata event*. The only way to spec this without changing the formdata behavior is to do the conversion after the "constructing the entry list" algorithm, by which point we've lost track of which entries are String/File-sourced. We could keep track of that with an additional boolean per entry, but that seems to add more complexity than the alternative.

I suspect implementing this in Chromium would have the same issue about keeping track of which entries were normalized.

*. Although we might also want to change what's observable from the formdata event, since only Chrome implements it by the spec.

@mfreed7
Copy link
Contributor

mfreed7 commented Feb 26, 2021

Thanks, as in #6247, for the great summary of the state of things.

I'm in favor of the proposal as written, which is effectively to normalize everything except for filenames on multipart/form-data. Even filenames seem like they should be normalized, given that newlines of any kind in filenames are almost surely a mistake. But for compat reasons, this likely needs to stay as-is.

@annevk, about your suggestion to leave FormData sourced un-normalized, do you have any intuition about the use cases? My (uninformed by data) guess is that mismatched line endings probably end up causing more bugs and confusion than the corresponding number of "real" use cases that require preservation of line endings. It sounds like you disagree - if so, is there any data/evidence to point to there?

@annevk

This comment has been minimized.

@andreubotella

This comment has been minimized.

andreubotella pushed a commit to andreubotella/url that referenced this pull request Feb 27, 2021
After whatwg/html#6287, no callers are left which invoke the
`application/x-www-form-urlencoded` serializer with file values.
@annevk
Copy link
Member

annevk commented Feb 27, 2021

(Apologies for the earlier reply, I had not properly paged this back in.)

I am mainly bothered by the inelegance of normalizing twice. Also, if you use FormData with fetch() today there is no normalization (this changes that) so it seemed somewhat reasonable that if you use FormData in this context there is none either. I could see always normalizing when serializing too, but then ideally form submission itself would not do it (as it would happen later). In the end though, if we want to normalize during form submission and serialization, that is fine with me as well.

@andreubotella
Copy link
Member Author

andreubotella commented Mar 9, 2021

I am mainly bothered by the inelegance of normalizing twice. Also, if you use FormData with fetch() today there is no normalization (this changes that) so it seemed somewhat reasonable that if you use FormData in this context there is none either. I could see always normalizing when serializing too, but then ideally form submission itself would not do it (as it would happen later). In the end though, if we want to normalize during form submission and serialization, that is fine with me as well.

You can observe the results of constructing the entry list with new FormData(form) – at least according to the spec and Chromium. Gecko and Webkit defer the normalization to the serialization, so if the double normalization is an issue, it might be worth considering whether normalizing during the form submission is needed. See also web-platform-tests/wpt#27922.

In any case, this PR can be merged now until we decide whether to change the behavior of new FormData(form) – the second normalization doesn't depend on the first one having been applied.

Edit: Marking Firefox as interested in this PR, since I'll be implementing it.

@andreubotella
Copy link
Member Author

The WPT PR (web-platform-tests/wpt#26740) is now approved, and I asked @annevk to merge it, believing this PR to have some level of consensus. I now realize that's not as clear as I thought. Let me know if I should revert it in the meantime.

In any case, I think this change is ready for another look.

annevk added a commit that referenced this pull request Apr 26, 2021
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some nits that I hope are acceptable. There are further things that could be cleaned up, but they affect a large part of the existing text as well and are probably best tackled separately (e.g., referencing the members of an entry explicitly).

Let's figure out how to land this with #6624 and then land this next week now there's implementer buy-in.

@andreubotella
Copy link
Member Author

I've removed the various notes about double normalization so it's easier to land this PR and #6624 together. See #6624 (review)

@andreubotella
Copy link
Member Author

andreubotella commented May 7, 2021

I hadn't realized that the "convert to a list of name-value pairs" algorithm as I'd written it leaves filenames unnormalized in application/x-www-form-urlencoded and text/plain, which wasn't my intention, goes against the WPT tests, and doesn't match Safari. This commit should fix it.

@andreubotella andreubotella requested a review from annevk May 7, 2021 07:31
andreubotella pushed a commit to andreubotella/html that referenced this pull request May 19, 2021
Complements whatwg#6287 and whatwg#6624.

Fixes whatwg#6647.

See also whatwg#6662 for further cleanup on the textarea data model.
@annevk annevk merged commit 07d087d into whatwg:main May 20, 2021
annevk added a commit that referenced this pull request May 20, 2021
As of #6287 newlines are normalized when form data is serialized. This removes the (mostly redundant) normalization in constructing the entry list.

Tests: web-platform-tests/wpt#28798.

Follow-up: #6697.

Fixes #6469.
annevk pushed a commit that referenced this pull request May 20, 2021
Complements #6287 and #6624. See also #6662 for further cleanup on the textarea data model.

Fixes #6647.
annevk pushed a commit to whatwg/url that referenced this pull request May 20, 2021
After whatwg/html#6287 no callers are left which invoke the application/x-www-form-urlencoded serializer with file values.

Closes #562.
@andreubotella andreubotella deleted the newlines-in-form-payloads branch May 20, 2021 12:17
FrankEnderman pushed a commit to FrankEnderman/cursemium that referenced this pull request Oct 29, 2021
Chrome used to perform newline normalization in form payloads early into
the form submission algorithm –in particular, during the entry list
construction–, along with an additional normalization at the point of
serializing the form data into `application/x-www-form-urlencoded` and
`text/plain` form payloads. The early normalization was spec compliant –
the late one wasn't, but was required because servers expected it.

This change implements some recent changes in the HTML spec that have
standardized this newline normalization behavior across browsers,
removing the early normalization entirely and replacing it with a late
normalization at the point of serializing, even for the
`multipart/form-data` enctype.

This change does not affect the submitted form payload for normal
user-triggered form submissions. It only changes how form entry lists
are inspected from Javascript using the `FormData` API and the
`formdata` event, and how form entries added from Javascript are
serialized (by fetching with a `FormData` body, or by modifying a form
submission through form-associated custom elements or through the
`formdata` event). As such, this change is implemented behind a
default-on `LateFormNewlineNormalization` feature flag.

This change implements the following HTML spec PRs:
whatwg/html#6287
whatwg/html#6624
whatwg/html#6697

See also:
https://blog.whatwg.org/newline-normalizations-in-form-submission

Intent to Ship:
https://groups.google.com/a/chromium.org/g/blink-dev/c/XULXQrbFznw

Fixed: 1167095
Change-Id: I0a28369aefe052c413a427733ef33d70988a13c1
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3226394
Reviewed-by: Kent Tamura <tkent@chromium.org>
Reviewed-by: Mason Freed <masonf@chromium.org>
Commit-Queue: Kent Tamura <tkent@chromium.org>
Cr-Commit-Position: refs/heads/main@{#936197}
mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this pull request Oct 14, 2022
Chrome used to perform newline normalization in form payloads early into
the form submission algorithm –in particular, during the entry list
construction–, along with an additional normalization at the point of
serializing the form data into `application/x-www-form-urlencoded` and
`text/plain` form payloads. The early normalization was spec compliant –
the late one wasn't, but was required because servers expected it.

This change implements some recent changes in the HTML spec that have
standardized this newline normalization behavior across browsers,
removing the early normalization entirely and replacing it with a late
normalization at the point of serializing, even for the
`multipart/form-data` enctype.

This change does not affect the submitted form payload for normal
user-triggered form submissions. It only changes how form entry lists
are inspected from Javascript using the `FormData` API and the
`formdata` event, and how form entries added from Javascript are
serialized (by fetching with a `FormData` body, or by modifying a form
submission through form-associated custom elements or through the
`formdata` event). As such, this change is implemented behind a
default-on `LateFormNewlineNormalization` feature flag.

This change implements the following HTML spec PRs:
whatwg/html#6287
whatwg/html#6624
whatwg/html#6697

See also:
https://blog.whatwg.org/newline-normalizations-in-form-submission

Intent to Ship:
https://groups.google.com/a/chromium.org/g/blink-dev/c/XULXQrbFznw

Fixed: 1167095
Change-Id: I0a28369aefe052c413a427733ef33d70988a13c1
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3226394
Reviewed-by: Kent Tamura <tkent@chromium.org>
Reviewed-by: Mason Freed <masonf@chromium.org>
Commit-Queue: Kent Tamura <tkent@chromium.org>
Cr-Commit-Position: refs/heads/main@{#936197}
NOKEYCHECK=True
GitOrigin-RevId: 3a41107f23310a86c7fa4557be14e90b867baa01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants