Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logic for media elements #16

Merged
merged 3 commits into from
Oct 7, 2021
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 26 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@ To block as many opaque responses as possible while remaining web compatible.

## High-level idea

CSS, JavaScript, and media (audio, images, video) can be requested across origins without CORS. Except for CSS there is no MIME type enforcement. Ideally we still block as many responses as possible that are not one of these types to avoid leaking their contents through side channels.
CSS, JavaScript, images, and media (audio and video) can be requested across origins without CORS. Except for CSS there is no MIME type enforcement. Ideally we still block as many responses as possible that are not one of these types to avoid leaking their contents through side channels.
annevk marked this conversation as resolved.
Show resolved Hide resolved

## Processing model

### New MIME type sets

annevk marked this conversation as resolved.
Show resolved Hide resolved
An **opaque-safelisted MIME type** is a [JavaScript MIME type](https://mimesniff.spec.whatwg.org/#javascript-mime-type) or a MIME type whose essence is "`text/css`" or "`image/svg+xml`".

An **opaque-blocklisted MIME type** is an [HTML MIME type](https://mimesniff.spec.whatwg.org/#html-mime-type), [JSON MIME type](https://mimesniff.spec.whatwg.org/#json-mime-type), or [XML MIME type](https://mimesniff.spec.whatwg.org/#xml-mime-type).
Expand Down Expand Up @@ -52,11 +54,18 @@ An **opaque-blocklisted-never-sniffed MIME type** is a MIME type whose essence i
* "`text/event-stream`"
* "`text/csv`"

A user agent has an **opaque-safelisted requesters set**. (This should be scoped similar to other network caches.)
### Changes to requests and media elements

A request has an associated **no-cors media URL** ("N/A", "initial-request", or a URL). It is "N/A" unless explicitly stated otherwise.
annevk marked this conversation as resolved.
Show resolved Hide resolved

We adjust the way media element fetching is done to more clearly separate between the initial and any subsequent range fetches:

* For its initial range request a media element sets no-cors media URL to "initial-request" and it follows redirects. That yields (after any redirects) an initial response.
* For its subsequent range requests the URL of the initial response is used as value of no-cors media URL (and URL) and it no longer follows redirects. Note: redirects here resulted in an error in Chrome until recently. We could somewhat easily allow same-origin redirects by adjusting the check performed against this URL, but it's not clear that's desirable.

A request has an associated **opaque media identifier** (null or an opaque identifier). Null unless explicitly stated otherwise.
(These changes are not needed when CORS is used, but it might make sense to align these somewhat, to the extent they are not already.)

\[The idea here is that the opaque media identifier is owned by the media element (audio/video only; I'm assuming we won't do range requests for images without at least requiring MIME types at this point). As part of the element being GC'd, it would send a message to get all the relevant entries from the user agent's opaque-safelisted requesters set removed. There might be better strategies available here and it's not clear to me to what extent we need to specify this, but it's probably good to have a model that does not leak memory forever so the set needs to be keyed to something. The fetch group might also be reasonable.]
### ORB's algorithm

To determine whether to allow response _response_ to a request _request_, run these steps:

Expand All @@ -65,14 +74,17 @@ To determine whether to allow response _response_ to a request _request_, run th
1. If _mimeType_ is not failure, then:
1. If _mimeType_ is an opaque-safelisted MIME type, then return true.
1. If _mimeType_ is an opaque-blocklisted-never-sniffed MIME type, then return false.
1. If _response_'s status is `206` and _mimeType_ is an opaque-blocklisted MIME type, then return false. TODO: is this needed with the requesters set?
1. If _response_'s status is 206 and _mimeType_ is an opaque-blocklisted MIME type, then return false.
Copy link
Collaborator

@anforowicz anforowicz Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step seems a bit undesirable

This special case for 206 seems to mean that a video served as text/html will not work with 206 responses, right? This seems undesirable.

Maybe we can remove this special case?

For non-media, "no-cors media request state" will be "N/A" and ORB will block the response in the following 2 new-ish steps:

  1. If request's no-cors media request state is "subsequent", then return true.
  2. If response's status is 206 and validate a partial response given 0 and response returns invalid, then return false.

If 206 is for the beginning of the body, ORB would still block the response after it doesn't sniff as image/audio/video/javascript (in later steps).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step seems a bit undesirable

This step stems from https://fetch.spec.whatwg.org/#corb-check. Did Chrome end up removing it?

Maybe we can remove this special case?

I'm not sure how the first quoted step is relevant.

As for the second step, earlier you mentioned that this was useful to have and would happen in the implementation before sniffing. Are you suggesting to only do it when no-cors media request state is "initial"? I suppose we could make that change. It would allow the server to serve random 206 responses as JavaScript (if they happen to parse), but I guess that's okay.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step stems from https://fetch.spec.whatwg.org/#corb-check.

Doh... I forgot about it. Yes, Chrome's CORB implementation still blocks 206 responses with html/json/xml MIME types. Given this, I think we should just land the PR in its current shape.

Are you suggesting [...]

I was just trying to argue why I think the earlier step (blocking 206 for html/xml/json) might be redundant, because latter steps (without any changes) would block such html/xml/json responses.

1. If _nosniff_ is true and _mimeType_ is an opaque-blocklisted MIME type or its essence is "`text/plain`", then return false.
1. If the user agent's opaque-safelisted requesters set contains (_request_'s opaque media identifier, _request_'s current URL), then return true.
1. If _request_'s no-cors media URL is a URL and it is equal to _request_'s current URL, then return true.
1. Wait for 1024 bytes of _response_ or end-of-file, whichever comes first and let _bytes_ be those bytes.
1. If the [image type pattern matching algorithm](https://mimesniff.spec.whatwg.org/#image-type-pattern-matching-algorithm) given _bytes_ does not return undefined, then return true.
1. If the [audio or video type pattern matching algorithm](https://mimesniff.spec.whatwg.org/#audio-or-video-type-pattern-matching-algorithm) given _bytes_ does not return undefined, then:
1. Append (_request_'s opaque media identifier, _request_'s current URL) to the user agent's opaque-safelisted requesters set.
1. If _requests_'s no-cors media URL is not "initial-request", then return false.
1. If _response_'s status is not 200 or 206, then return false.
anforowicz marked this conversation as resolved.
Show resolved Hide resolved
1. If _response_'s status is 206 and [validate a partial response](https://wicg.github.io/background-fetch/#validate-a-partial-response) given 0 and _response_ returns invalid, then return false.
annevk marked this conversation as resolved.
Show resolved Hide resolved
1. Return true.
1. If _requests_'s no-cors media URL is not "N/A", then return false.
1. If the [image type pattern matching algorithm](https://mimesniff.spec.whatwg.org/#image-type-pattern-matching-algorithm) given _bytes_ does not return undefined, then return true.
1. If _nosniff_ is true, then return false.
1. If _response_'s status is not an [ok status](https://fetch.spec.whatwg.org/#ok-status), then return false.
1. If _mimeType_ is failure, then return true.
Expand All @@ -82,10 +94,15 @@ To determine whether to allow response _response_ to a request _request_, run th

Note: responses for which the above algorithm returns true and contain secrets are strongly encouraged to be protected using `Cross-Origin-Resource-Policy`.

## Implementation considerations

Setting the no-cors media URL to a URL ideally happens in a process that is not easily compromised as otherwise it can be used to bypass ORB in such a compromised process.
annevk marked this conversation as resolved.
Show resolved Hide resolved

## Findings

* It's unfortunate `X-Content-Type-Options` mostly kicks in after media sniffing, but it was not web compatible for Firefox to enforce it for images back in the day.
* It's unfortunate `X-Content-Type-Options` mostly kicks in after image/media sniffing, but it was not web compatible for Firefox to enforce it for images back in the day.
* Due to the way [style sheet fetching works](https://github.com/whatwg/fetch/issues/964) we cannot protect responses without an extractable MIME type.
* Media elements always make range requests.

## Acknowledgments

Expand Down