Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web reality: Filenames should be newline-normalized in urlencoded #562

Closed
andreubotella opened this issue Dec 3, 2020 · 9 comments · Fixed by whatwg/html#6287
Closed

Comments

@andreubotella
Copy link
Member

Tests: web-platform-tests/wpt#26740

All browser engines seem to be newline-normalizing filenames as if they were string values going through the same normalization as in HTML's append an entry. This is arguably a result of implementations doing the normalizing at the end (see web-platform-tests/wpt#26556, https://bugzilla.mozilla.org/show_bug.cgi?id=1657844, https://bugs.webkit.org/show_bug.cgi?id=219086), but since all browsers share it, maybe it should be fixed in spec-land. And since browsers don't newline-normalize filenames with multipart/form-data, it'd have to be in the urlencoded serializer.

@andreubotella
Copy link
Member Author

andreubotella commented Dec 3, 2020

To clarify a bit the reasoning for why this normalization must happen in the urlencoded serializer:

When constructing an entry list from a <form> element, the HTML spec requires (in "append an entry") that the names and values (other than File values) be newline-normalized, but not filenames. This is directly web-observable by constructing a FormData object from the <form>, and only Chrome implements that normalization at this stage. (Tests: https://wpt.fyi/results/html/semantics/forms/form-submission-0/newline-normalization.html?label=pr_head&max-count=1&pr=26747)

When encoding an entry list as multipart/form-data, the spec doesn't mandate any further newline normalization (though there's the percent-encoding we're trying to incorporate in whatwg/html#3276 whatwg/html#6282). Note that not all entry lists that reach this step must be newline-normalized by the spec, since you can construct a FormData object from scratch. Gecko and WebKit do the newline normalization at this stage; wrongly for FormData objects. (Tests: https://wpt.fyi/results/FileAPI/file/send-file-form-controls.tentative.html?label=experimental&label=master&aligned for form submission, https://wpt.fyi/results/FileAPI/file/send-file-formdata-controls.tentative.html?label=experimental&label=master&aligned for FormData).

When encoding an entry list as urlencoded there shouldn't be any further newline normalization applied. But since filenames now become values, you might expect Gecko and WebKit to normalize newlines in filenames. What's strange is that Chrome is also doing it for some reason. (Tests: https://wpt.fyi/results/url/urlencoded-filenames.window.html?label=pr_head&max-count=1&pr=26740)

Since all browsers agree that newlines in filenames get normalized, this should be incorporated into the specs, and since in the Chrome/spec behavior filenames don't get normalized in multipart/form-data, the remaining place to do the normalization in spec-land is in the urlencoded serializer.

@annevk
Copy link
Member

annevk commented Dec 4, 2020

It seems to the flattening from File to string would have to happen in HTML then, so we keep all the normalization there. (If that is actually how it works we can also simplify the serialization defined here in URL as value would always be a string.) To verify testing text/plain would be good as that should have similar results.

@andreubotella
Copy link
Member Author

So under this new behavior, constructing the entry list would depend on the form method: for multipart/form-data it'd stay with the current behavior, so all the tests related to that method and FormData would stay valid. But for urlencoded and text/plain, the value of the entry would be set to the normalized filename, and no changes would be needed in the actual serialization algorithm for those methods.

@annevk
Copy link
Member

annevk commented Dec 4, 2020

Yeah, it's a little unfortunate, but it also seems quite reasonable as there is no real reason for urlencoded and text/plain to have to deal with files.

@andreubotella
Copy link
Member Author

andreubotella commented Dec 14, 2020

Around the time of the last post so far in this thread, @annevk and I reasoned on IRC that it was fine for "constructing the entry list" to depend on the form's enctype (with new FormData(formEl) counting as multipart/form-data) because there wasn't any API that could, say, serialize a FormData as urlencoded or turn a URLSearchParams into a FormData.

Unfortunately, when I was working on a PR, I noticed two APIs that make it possible for an entry list constructed as urlencoded or text/plain to reach contexts that would expect an entry list constructed as multipart/form-data:

I'll be adding tests to web-platform-tests/wpt#26740 to figure out what browsers do in each case.

Also, this issue should be probably moved to HTML.

@annevk
Copy link
Member

annevk commented Dec 14, 2020

So the problem here is that the URL Standard (and urlencoded in particular) would still be able to get File objects passed in as values, so it has to handle those, unless HTML performs additional normalization around FACE and the formdata event. I kinda wish we could just toss out newline normalization completely, but it's probably decades too late for that. Perhaps it's okay to not normalize them in these newer APIs?

cc @whatwg/forms

@andreubotella
Copy link
Member Author

A FormData populated from script already doesn't perform any normalization, so you can end up with a multipart/form-data with \n rather than \r\n in the names. (This per the spec and Chrome's behavior; Firefox and WebKit seem to normalize when serializing rather than when constructing the entry list.)

@andreubotella
Copy link
Member Author

andreubotella commented Dec 17, 2020

I've been testing the behavior of browsers in several combination of cases.

Some of the features I was testing weren't available in all browsers. In particular:


Browsers treat "string/File-sourced entries" different from "FormData-sourced entries" (these are ad-hoc terms, please don't use them anywhere else): "string/File-sourced entries" derive from either an <input type="hidden">, an <input type="file">, a FACE with a string submission value, or a FACE with a File submission value. "FormData-sourced entries" derive from a FACE with a FormData submission value, or are added to the entry list through the formdata event.

"Normalized" here means following the normalization in the "append an entry" algorithm. When used for serializations, it means all names and values are serialized, including string values that come from filenames in the entry list.

  • application/x-www-form-urlencoded and text/plain:
    • String/File-sourced entries:
      • When observed from the formdata event:
        • Chrome, Safari: names and string values are normalized
        • Firefox: unchanged
      • Serialized as: normalized
    • FormData-sourced entries:
      • When observed from the formdata event: unchanged
      • Serialized as: normalized*
  • multipart/form-data:
    • String/File-sourced entries:
      • When observed from the formdata event:
        • Chrome, Safari: names and string values are normalized
        • Firefox: unchanged
      • Serialized as:
        • Chrome, Safari: names and string values are normalized, filenames are unchanged
        • Firefox: values are normalized (files and filenames N/A)
    • FormData-sourced entries:
      • When observed from the formdata event: unchanged
      • Serialized as:
        • Chrome: unchanged
        • Firefox: values are normalized (files and filenames N/A)

*. Chrome treats the serialization of FormData-sourced entries differently in the text/plain enctype than it and Firefox treat them in urlencoded: names are unchanged, and so are values which derive from an original string value, but values which derive from an original filename are normalized. This seems to be almost certainly a bug.

@andreubotella
Copy link
Member Author

andreubotella commented Dec 24, 2020

I opened whatwg/html#6247 to discuss the topic of newline normalization in forms more broadly than just for urlencoded. The test results I added in that issue group things differently, cover more cases, include the current spec-mandated behavior side-by-side with the browsers', and fix the fact that the results above for Safari are wrong (my tests for observing the formdata event were "succeeding" because the event wasn't being fired at all).

andreubotella pushed a commit to andreubotella/html that referenced this issue Jan 13, 2021
When entries are added to a form's entry list through the "append an
entry" algorithm, their newlines are normalized, but entries can be
added to an entry list through other means. This change adds a final
newline normalization before serializing the form payload, since "append
an entry" cannot be changed because its results are observable through
the `FormData` object or through the `formdata` event.

This change additionally changes the input passed to the
`application/x-www-form-urlencoded` and `text/plain` serializes to be a
list of name-value pairs, where the values are strings rather than
`File` objects. This simplifies the serializer algorithms.

Closes whatwg#6247. Closes whatwg/url#562.
andreubotella pushed a commit to andreubotella/html that referenced this issue Jan 13, 2021
When entries are added to a form's entry list through the "append an
entry" algorithm, their newlines are normalized, but entries can be
added to an entry list through other means. This change adds a final
newline normalization before serializing the form payload, since "append
an entry" cannot be changed because its results are observable through
the `FormData` object or through the `formdata` event.

This change additionally changes the input passed to the
`application/x-www-form-urlencoded` and `text/plain` serializers to be a
list of name-value pairs, where the values are strings rather than
`File` objects. This simplifies the serializer algorithms.

Closes whatwg#6247. Closes whatwg/url#562.
annevk pushed a commit to whatwg/html that referenced this issue May 20, 2021
User agents normalize newlines when serializing form data to text/plain, application/x-www-form-urlencoded, and multipart/form-data. (This can be observed through FormData or the formdata event.)

This additionally changes the input passed to the application/x-www-form-urlencoded and text/plain serializers to be a
list of name-value pairs, where the values are always strings rather than potentially File objects.

Tests: web-platform-tests/wpt#26740.

Follow-up: #6624 & #6697.

Closes #6247. Helps with whatwg/url#562.
annevk pushed a commit that referenced this issue May 20, 2021
After whatwg/html#6287 no callers are left which invoke the application/x-www-form-urlencoded serializer with file values.

Closes #562.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants