Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the interaction between data URLs and workers (and some other data URL discussion) #1243

Closed
bzbarsky opened this issue May 12, 2016 · 40 comments · Fixed by #1782
Closed
Assignees

Comments

@bzbarsky
Copy link
Contributor

At some point in the various refactorings of fetch and stuff, we ended up with https://html.spec.whatwg.org/multipage/semantics.html#concept-link-obtain calling https://html.spec.whatwg.org/multipage/infrastructure.html#create-a-potential-cors-request and none of that ever setting the "same-origin data-URL flag" on the request.

That means that as currently specced if I do this:

<link rel="stylesheet" href="data:text/css, body { color: green }">
<script>
  onload = function() {
    var rules = document.styleSheets[0].cssRules;
  };
</script>

I should get a security exception because the sheet is not same-origin. The problem is, I don't think anyone actually implements that. WebKit and Blink violate the CSSOM spec's requirements for cross-origin stuff: they set rules to null in that situation and allow appending new rules to the cross-origin sheet. Gecko and Edge make the sheet same-origin, which allows the page to touch its CSSOM as desired.

There is absolutely no reason to prevent CSSOM access to the data: sheet in this situation, so I think the fetches for stylesheets should pass the "same-origin data-URL flag", both here and for @import.

/cc @zcorpan since the latter needs changes to https://drafts.csswg.org/cssom/#fetching-css-style-sheets or its caller in @import.

@bzbarsky
Copy link
Contributor Author

bzbarsky commented May 12, 2016

Oh, and the point is there is various stuff that's currently allowed by all browsers for data: sheets without throwing but is supposed to throw per spec. My guess is that the chances of said throwing being web-compatible are very low, so we need to change something about the specs here. And as long as we're doing that, we should do the thing that's maximally useful to web authors, and treat a data: sheet no differently than we would treat an inline stylesheet with the same text.

@domenic
Copy link
Member

domenic commented May 12, 2016

@zcorpan, would be ideal if you could work on this and #968 when you get the chance; they seem kind of problematic.

@annevk
Copy link
Member

annevk commented May 13, 2016

@mikewest ^^

@mikewest
Copy link
Member

I agree with @bzbarsky that there's not much harm in treating such a stylesheet as same-origin. They can't be introduced from a cross-origin frame in the same way that <iframe> can, so the origin seems pretty unambiguous.

I also agree with the assertions in #1779 with regard to images, audio, and video, for the same reasons. Chrome does carve out an exception for data: URLs when calculating canvas tainting, for instance (https://cs.chromium.org/chromium/src/third_party/WebKit/Source/platform/weborigin/SecurityOrigin.cpp?rcl=0&l=311), and reuses that logic for media elements. Aligning browsers' behavior here seems totally reasonable.

@domenic
Copy link
Member

domenic commented Sep 13, 2016

OK. So what is our best path forward here? A spec change that fixes all of these to use the same origin data URL flag? And web platform tests asserting that they work?

@annevk
Copy link
Member

annevk commented Sep 14, 2016

So where do we want to treat them as cross-origin? Navigation is handled by setting the origin on document. Is workers the only other place? Because then instead of a flag we should just special case workers in Fetch or some such.

@annevk annevk assigned annevk and unassigned mikewest and zcorpan Sep 14, 2016
annevk added a commit that referenced this issue Sep 14, 2016
The change to Fetch discussed in
whatwg/fetch#381 made it obsolete.

Closes #1243, closes #1778, and closes #1779 as these are all treated
as same-origin now per the change to Fetch.
@bzbarsky
Copy link
Contributor Author

It seems to me that we want to treat data: as cross-origin any time there will be script running with whatever origin the load ends up with.

In practice I believe that restricts to navigation and maybe workers. <script> load and importScripts already allow loads from all origins and don't run the script with the origin the load produces, so don't matter here, and nothing else I can think of runs scripts offhand.

For navigation, there's long discussion already.

For workers, there are three conceivable behaviors:

  1. Worker from a data: URL runs with the origin of the page that created it. This is like the navigation behavior in Firefox.
  2. Worker from a data: URL runs with a singleton origin. This is like the navigation behavior in Chrome, and is the worker behavior in Firefox last I checked. Note that this does have the complication of no longer having worker and creating page always match origins.
  3. Worker from a data: URL never runs. This is the Chrome behavior.

Behavior 3 is simpler than behavior 2, but I think a bit less developer-friendly: you have to stick your string in a blob and then createObjectURL to get it to execute as a worker, which is a good bit more complicated than just prepending "data:application/javascript," to it and leads to the string leaking for the page lifetime unless you rememeber to revoke. I think behavior 2 doesn't introduce XSS vectors, fwiw.

@domenic
Copy link
Member

domenic commented Sep 14, 2016

Just for the record, in my capacity as trying to help coordinate this many-threaded issue, it looks like whatwg/fetch#387 goes with 3 for now, if I am reading it correctly.

@mikewest
Copy link
Member

I would be happy with unique-origin data: Workers. I agree that there are some complexities to breaking the assumption that the two are always same-origin, but I think we can work those out. I'd like to move Chrome in this direction.

@mikewest
Copy link
Member

(Note that this means we should probably allow sandbox to apply to Workers to push them into a unique origin, and relax the constraint at https://w3c.github.io/webappsec-csp/#sandbox-response accordingly)

@annevk
Copy link
Member

annevk commented Sep 15, 2016

Okay, so for 2 we'd treat data URLs as same-origin in Fetch, but we special case worker environment creation to assign a unique origin for data URLs. We should only do this for dedicated workers I assume, as otherwise multiple origins can share a shared worker, which I don't think is a feature we necessarily want to offer, right?

@annevk
Copy link
Member

annevk commented Sep 15, 2016

In particular, for 2 I need to update whatwg/fetch#387 to remove "worker" (so it's considered same-origin) and then update #1782 to assign this opaque origin to (dedicated) workers for data URLs. Allowing sandbox on workers we can do separately.

@annevk
Copy link
Member

annevk commented Sep 15, 2016

If someone can confirm that I'm happy to make the updates.

@mikewest
Copy link
Member

I don't think that a data: shared worker makes sense, no. A data: dedicated worker makes more sense.

I can imagine cases in which a sandboxed shared worker could make sense, but I'd be fine with limiting that to dedicated workers as well.

I don't think data: or sandboxing makes sense for service workers.

@annevk
Copy link
Member

annevk commented Sep 15, 2016

Sandboxed shared worker seems fine, since setting it up is still limited by same-origin checks. I guess I can take a look how much additional work the sandboxing is while making this change.

@annevk
Copy link
Member

annevk commented Sep 15, 2016

Sandboxed shared workers are actually more complicated than I thought, because with module workers the URL can be cross-origin (we rely on a same-origin check between the worker global scope and the current settings object of the script invoking new SharedWorker). So we'd have to change how that works somehow. Perhaps storing two origins...

@annevk
Copy link
Member

annevk commented Sep 15, 2016

New plan:

  • Make Fetch always treat data URLs as same-origin. Add a note in Fetch that says HTML assigns an opaque origin to any documents and dedicated workers created from data URLs, and that shared workers and service workers cannot use them either due to locally imposed restrictions.
  • Update "run a worker" in HTML to error out on data URLs for shared workers. And use an opaque origin for dedicated workers.
  • Leave dedicated worker sandboxing changes out for now unless they turn out to be easy (shared worker sandboxing does not seem easy per above so definitely out).

The only thing that's still a little unclear in all this is module workers and their interaction with data URLs. Should probably investigate that a little more.

@annevk
Copy link
Member

annevk commented Sep 15, 2016

I updated whatwg/fetch#387 and #1782 per above. Dedicated module workers created from data URLs would work, with any "cors" fetches they make for imports from an opaque origin (Origin: null).

@annevk
Copy link
Member

annevk commented Sep 15, 2016

@mikewest it seems sandboxing changes will need to be separate from this anyway right? That's not something I can directly change in HTML as far as I can tell (it might require some changes to the existing setup though).

@mikewest
Copy link
Member

Yes, I think splitting sandbox changes out into a separate patch is totally reasonable, as they'll be a little more involved. I think the approach you're taking in the patchs above look reasonable; I left tiny comments.

@annevk
Copy link
Member

annevk commented Sep 15, 2016

Behavior 3 is simpler than behavior 2, but I think a bit less developer-friendly: you have to stick your string in a blob and then createObjectURL to get it to execute as a worker, which is a good bit more complicated than just prepending "data:application/javascript," to it and leads to the string leaking for the page lifetime unless you rememeber to revoke. I think behavior 2 doesn't introduce XSS vectors, fwiw.

To be clear, using a unique opaque origin might lead folks to use the workaround anyway if they need to access origin-specific data (blob URLs do result in same-origin workers). If the worker doesn't need to access storage and such though I agree that 2 is nicer (and 2 is what my PR does now).

@bzbarsky
Copy link
Contributor Author

Yeah, I'm not 100% up on what Firefox does with workers, especially shared workers, for the data: case. @wanderview or @bakulf might know it better. I did just test and we seem to at least start a shared worker for data:, but don't seem to hand it any useful ports or something?

@wanderview
Copy link
Member

Yeah, I'm not 100% up on what Firefox does with workers, especially shared workers, for the data: case. @wanderview or @bakulf might know it better. I did just test and we seem to at least start a shared worker for data:, but don't seem to hand it any useful ports or something?

SharedWorker with a data URL script works for me. Run these two lines in your devtools:

var w = new SharedWorker("data:application/javascript,onconnect = evt => { evt.ports[0].onmessage = _ => console.log('msg'); evt.ports[0].start() }");
w.port.postMessage({});

And look for a msg log in the browser console. (Note, it does not come out in the attached document's web console.)

@annevk
Copy link
Member

annevk commented Sep 15, 2016

I get it to run as well, but when I use the same data URL in a different window I don't connect to the same worker as far as I can tell. So it's unclear how useful it is and would probably require even more special casing to support. The same setup throws in other browsers.

@wanderview
Copy link
Member

It works from two windows for me. I modified my script to print the time when the worker started to disambiguate the two:

var w = new SharedWorker("data:application/javascript,d = Date.now(); onconnect = evt => { evt.ports[0].onmessage = _ => console.log(d); evt.ports[0].start() }")

Then this logs the same timestamp when executed on windows with the same origin:

w.port.postMessage({})

If you call that on a different origin window then you will get a different worker and timestamp.

@annevk
Copy link
Member

annevk commented Sep 15, 2016

Right, you get a different worker. That's not shared...

@wanderview
Copy link
Member

Right, you get a different worker. That's not shared...

AFAIK we're not supposed to share across origins.

This works just like if you were using blob URLs, no?

@annevk
Copy link
Member

annevk commented Sep 15, 2016

What do you mean, across origins? I was talking about two windows that share an origin.

@wanderview
Copy link
Member

I get the same worker for same origin windows.

@asutherland
Copy link

@annevk Maybe you are using multiprocess with multiple content processes (which is expected to not work)? I used Ben's second example to create a webpage, visible for your viewing pleasure at https://gist.github.com/asutherland/d0ee823e37d8400ce6a125e5416e6884 and used gdb to verify what's going on in Firefox trunk under the hood. It's also fine to use Firefox devtools to use existing pages/origins, everything still ends up the same.

All instances of either page, served on localhost, result in RuntimeService's WorkerDomainInfo for "localhost" tracking a single SharedWorker with the key 0||data:application/javascript,d=Date.now();onconnect=evt=>{evt.ports[0].onmessage=_=>console.log(d);evt.ports[0].start()}.

@annevk
Copy link
Member

annevk commented Sep 16, 2016

I think if we wanted to allow that, while also giving those shared workers a unique opaque origin (Gecko does not do this at the moment, but as discussed upthread that's somewhat essential for this data URL compromise), we'd have to introduce additional state. We'd basically need to have an origin that we can use for comparison when constructing the SharedWorker object, and one that SharedWorkerGlobalScope can use. And given that shared workers are still not much loved per #315 I'm not sure that complexity is warranted at this time.

Would you be okay with no longer supporting data URLs for shared workers?

@annevk
Copy link
Member

annevk commented Sep 16, 2016

I was wrong, it does seem like Gecko already has that double bookkeeping. At least when testing fetch() from a data URL dedicated/shared worker, it uses Origin: null in the request. So I guess then the question becomes whether @mikewest is happy to implement that in Blink.

@mikewest
Copy link
Member

I don't understand why it would be valuable to add this kind of complexity to Blink. Do y'all actually see usage of SharedWorkers with data: URLs? It's not a request I've ever heard from developers, and it sounds like complexity I'd prefer to avoid entirely.

@annevk
Copy link
Member

annevk commented Sep 16, 2016

Note that upon further reflection I'm also not entirely sure why data URLs result in an opaque origin here. For module workers we use CORS, meaning that any cross-origin script can end up executing with the origin of the worker creator. It's not clear to me why in such a world we'd not also allow data URLs to execute with the origin of the worker creator.

@annevk
Copy link
Member

annevk commented Sep 16, 2016

And note that if we accepted that module worker view of the world, we don't need the complexity to support data URL shared workers, but Gecko would have to change to inherit the origin from the creator rather than using an opaque origin as it does today.

@annevk annevk changed the title Should data: stylesheets really not end up same-origin with whoever loaded them? What is the interaction between data URLs and workers (and some other data URL discussion) Sep 16, 2016
@mikewest
Copy link
Member

Hrm. I guess you're right. There's a pretty clear analog between Worker("data:...") and <script src="data:...">. If we accept one, we should probably accept the other.

I would like to have a way of creating unique-origin Workers, as that seems like a nice way to compartmentalize work in a safer way. If we're planning to support sandbox, that might be enough (though it's somewhat harder to use than I'd like (and wouldn't be possible to use with data: URLs as they have no headers)). Maybe a new "opaque origin" flag on WorkerOptions could be possible?

@bzbarsky
Copy link
Contributor Author

There's a pretty clear analog between Worker("data:...") and <script src="data:...">

No, there is one significant difference. If you do <script src="http://other-site.com"> the script runs with your origin. If you do that with a Worker, you get nothing running at all.

This means people may reasonably assume that putting a random URL in a Worker is safe, in that if it's not controlled by them it won't run (at least not run as them), while absolutely no one has such an expectation for <script> and knows that if you link to a random script you don't control you are allowing whoever does control it to do whatever they want in your name.

@annevk
Copy link
Member

annevk commented Sep 16, 2016

If you do that with a Worker, you get nothing running at all.

As I pointed out, no longer true with module workers.

@annevk
Copy link
Member

annevk commented Sep 16, 2016

Maybe a new "opaque origin" flag on WorkerOptions could be possible?

Yeah, new issue? That and sandboxing can be discussed separately from data URLs.

@annevk
Copy link
Member

annevk commented Sep 29, 2016

I guess module workers do require opt-in, so maybe from that perspective we should still do the unique opaque origin for data URLs (especially if it's a feature we'd like anyway).

annevk added a commit that referenced this issue Sep 30, 2016
The change to Fetch discussed in
whatwg/fetch#381 made it obsolete.

Closes #1243, closes #1778, and closes #1779 as these are all treated
as same-origin now per the change to Fetch.
annevk added a commit that referenced this issue Oct 7, 2016
The change to Fetch discussed in
whatwg/fetch#381 made it obsolete.

Closes #1243, closes #1778, and closes #1779 as these are all treated
as same-origin now per the change to Fetch.
annevk added a commit that referenced this issue Oct 10, 2016
Fixes #1243. Basically, data URLs create (shared) workers that have a
unique opaque origin.
domenic pushed a commit that referenced this issue Oct 10, 2016
Fixes #1243. Basically, data URLs create (shared) workers that have a
unique opaque origin.
alice pushed a commit to alice/html that referenced this issue Jan 8, 2019
Fixes whatwg#1243. Basically, data URLs create (shared) workers that have a
unique opaque origin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

7 participants