Should we still sniff if Content-Type is text/html? #173

cdumez · 2023-07-14T03:25:31Z

I recently investigated a web-platform-test failure (html/semantics/embedded-content/the-iframe-element/iframe_sandbox_anchor_download_block_downloads.tentative.html) in WebKit/CFNetwork and found out that it was caused by sniffing when the Content-Type is text/html:

Add X-Content-Type-Options: nosniff header to download_stash.py web-platform-tests/wpt#41037

This test is passing in Chrome and Firefox so I am wondering. Does the specification (https://mimesniff.spec.whatwg.org/#determining-the-computed-mime-type-of-a-resource) really match Chrome and Firefox here?

Should CFNetwork stop sniffing when the Content-Type is text/html?

The text was updated successfully, but these errors were encountered:

cdumez · 2023-07-14T03:25:40Z

cc @annevk

annevk · 2023-08-30T14:05:36Z

As far as I can tell from https://chromium.googlesource.com/chromium/src/net/+/refs/heads/main/base/mime_sniffer.cc Chromium does not appear to sniff text/html responses. They only invoke SniffXML (which is what they have for "feeds") for text/xml and application/xml. (Which seems to contradict step 4 of https://mimesniff.spec.whatwg.org/#determining-the-computed-mime-type-of-a-resource which does not allow sniffing of XML MIME types.)

@MattMenke2 I see you touched that Chromium code quite a bit, any thoughts?

@valenting @mozfreddyb do either of you know what Gecko does for text/html responses? Do they get sniffed for feeds as the MIME Sniffing standard suggests in step 5 of https://mimesniff.spec.whatwg.org/#determining-the-computed-mime-type-of-a-resource or are they not sniffed at all? (I got a bit lost trying to find the relevant callers of the functions in https://searchfox.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp.)

MattMenke2 · 2023-08-30T16:38:29Z

I don't claim to be an expert on what MIME sniffing should actually do, but yes, my reading is also that Chrome does not sniff responses with a text/html content-type.

I also agree that Chrome's sniffing of files with those XML MIME types looks to violate spec, and would certainly love to see more standardization here (especially in the direction of less sniffing). Note that I'm no longer on Chrome's networking team. I'm still on Chrome, and still happy to talk about these things, just not a good decision person or driver of changes around, e.g., not sniffing files with XML types.

josepharhar · 2023-08-30T16:51:29Z

ccing some people who might be able to help with this
@ricea @horo-t @mikewest

valenting · 2023-09-01T07:40:36Z

@valenting @mozfreddyb do either of you know what Gecko does for text/html responses? Do they get sniffed for feeds as the MIME Sniffing standard suggests in step 5 of https://mimesniff.spec.whatwg.org/#determining-the-computed-mime-type-of-a-resource or are they not sniffed at all? (I got a bit lost trying to find the relevant callers of the functions in https://searchfox.org/mozilla-central/source/netwerk/streamconv/converters/nsUnknownDecoder.cpp.)

As far as I can tell we we don't create an nsUnknownDecoder if it's already present on the response.
@farre and @sefeng211 have been working with this code recently so they might know more.

farre · 2023-09-01T08:56:00Z

There's also a sniff happening here, if LOAD_CALL_CONTENT_SNIFFERS is set, which it is in nsDocShellLoadState::CalculateChannelLoadFlags, but I'm not sure if we're hitting that path, I'd have to check. And I also don't know which sniffers would get called there in that case. It might just be media sniffers.

mikewest · 2023-09-04T09:17:20Z

cc @otherdaniel, who's looking into sniffing (or not) insofar as it impacts ORB.

As no user agent today appears to identify a text/html resource starting with <rss as XML, remove those rules from the standard. At the same time, make it more clear that XML (and now HTML) are never sniffed. This is a non-normative change for clarity. Tests: TBD. Closes #173.

annevk · 2024-07-08T15:33:41Z

@otherdaniel @valenting @farre @sefeng211 @MattMenke2 anyone willing to review #192?

As no user agent today appears to identify a text/html resource starting with <rss as XML, remove those rules from the standard. At the same time, make it more clear that XML (and now HTML) are never sniffed. This part is a non-normative change for clarity. Tests: web-platform-tests/wpt#47002. Closes #173.

annevk mentioned this issue Jul 14, 2023

Add X-Content-Type-Options: nosniff header to download_stash.py web-platform-tests/wpt#41037

Merged

annevk mentioned this issue Jul 4, 2024

Remove sniffing of HTML #192

Merged

5 tasks

annevk closed this as completed in #192 Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we still sniff if Content-Type is text/html? #173

Should we still sniff if Content-Type is text/html? #173

cdumez commented Jul 14, 2023

cdumez commented Jul 14, 2023

annevk commented Aug 30, 2023

MattMenke2 commented Aug 30, 2023

josepharhar commented Aug 30, 2023

valenting commented Sep 1, 2023

farre commented Sep 1, 2023

mikewest commented Sep 4, 2023

annevk commented Jul 8, 2024

Should we still sniff if Content-Type is text/html? #173

Should we still sniff if Content-Type is text/html? #173

Comments

cdumez commented Jul 14, 2023

cdumez commented Jul 14, 2023

annevk commented Aug 30, 2023

MattMenke2 commented Aug 30, 2023

josepharhar commented Aug 30, 2023

valenting commented Sep 1, 2023

farre commented Sep 1, 2023

mikewest commented Sep 4, 2023

annevk commented Jul 8, 2024