-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: blank worker #6911
Comments
FWIW I don't think One thing I didn't spot in the proposal - IIRC cross-origin works fine via CORS for module workers today, so the pain point here is limited to classic workers? |
I don't think the top level script, whether classic or module, can be cross-origin to the owning context. |
This is a good point. I guess it argues for a unified API like Naming considerations:
An alternate direction would be to only support modules. I guess you would say |
I don't really understand how
works given that the origin would still be the origin of whoever created the worker. Or is this mainly about the |
Would it be viable to give a module worker a different API than a classic worker? For example, classic workers having both (Same applies to |
Are ServiceWorker reasonable to consider for an API like this? Their caching behavior is tied to scripts loaded in the first tick (isn’t it?), so my hunch is that the usefulness in a SW context would be quite limited.
I’d love to help with this, although I feel like it’s orthogonal to this. Do you think it’s something that could succeed in standards space, considering that it can be solved in user land? |
I'm not saying that it should be a service worker, but that the way service worker interception/selection works for dedicated and shared workers cannot work for a worker that is not started with a URL. (It would have to reuse the service worker from whoever created the instance probably.) This also goes for CSP, Referrer Policy, etc. It's not clear to me we should continue to support classic scripts. Isn't that idea that everything new is module scripts? As for the ergonomic API. I want to make sure that if we create something new we could add that later if desired. Also, there are a lot of things that can be done in user land, but have been added to standards to make things easier for developers. So yeah, I think it could succeed. |
re:
I don't understand how this is different from the proposed explicit methods, which seem to do just that - load cross-origin scripts into a same-origin worker. Instead of allowing empty worker + coming up with an extra method name, can we expose this as an option on existing new Worker('http://...', {
type: 'module',
crossOrigin: true // or 'anonymous' or something else
}) Functionally it would do the same as proposed methods, but seems more ergonomic than splitting up the Worker creating and script loading steps. |
That would not work with service worker selection. Similar to navigation that relies on it being same origin. |
I'm strongly opposed to creating a new worker type for this. I really don't even understand the motivation around that. The lifetime of the worker is unchanged by this proposal which seems to be the defining factor of the different worker types. (Single owner vs multiple owner vs background ephemeral.) Also, I don't think service worker interception would work differently at all. It would be just like how an about:blank iframe works (in firefox and the spec). The I guess I didn't say it in the explainer, but
In my view there are two capabilities iframes have that workers lack that makes this situation worse.
Adding these two capabilities to workers is what I meant by aligning with iframes. Edit: I think its because of these capabilities we rarely see people making blob URL iframes. Its just easier to make a blank one and mutate it. Lets make it that easy for workers too.
I don't think we should allow service workers with I also wouldn't add the external import capability to service workers since it would fail outside of the install phase. |
This seems like a nice way to pave the way towards reducing/eliminating URL.createObjectURL and to make it easier for tracking protection to block tracking worker globals without having to actually spawn the worker first! |
I second #6911 (comment), I really don't see the point of having this get split in two new features. If there is no need to load multiple scripts, then why have a method at all? The constructor option seems largely enough, and that would make everything so much easier to have just one new option in the constructor instead of (several?!) new methods with unclear behavior (it might be hard for developers to understand when are these scripts loaded and executed. Should the worker script get a new set of events to let know when the external import of a script succeeded or failed etc.)... And regarding the will to reduce the use of |
To me avoiding confusion about the origin of the created context is important. In all cases I'[m aware of on the web platform the primary loaded URL determines the origin of the context. Using the proposed snippet above would confuse this situation IMO. It makes the cross origin URL look like the primary URL of the worker and therefore suggests it should get the other origin. Adding a The blank worker proposal purposely separates the creation of the worker from the loading the subresource in the worker in order to avoid this confusion. Yes, its an extra statement to call, but I think that's worth it to make the conceptual model clear. Also, there is value in being able to load script in a worker multiple times. It gets us closer to supporting efficient coroutines. For example, you could imagine |
I think there's a big need to load multiple scripts: we want to encourage people to reuse workers (threadpool-style), instead of creating new ones for each script. |
In my experience, more often than not Workers expect to be self-contained - e.g. they will register On the other hand, if loaded modules are intended to be used together and are aware of each other, then, as @Kaiido pointed out, this can be solved by creating a separate entry point that simply imports both of the required sources. It adds a request indirection, but IMO 1) that's fine for the less common case of multiple modules per Worker and 2) will be usually bundled in prod anyway, while 3) it would improve API for the more common usecase of loading a single module. |
Well, I do expect people would need to write to the new feature in order to take advantage of it. And I don't think the new feature is adding a lot of new risk since I expect most worker scripts do not protect themselves from being imported into a different top level script. (I think you can prevent this with CSP, but I have not verified.) |
Could these motivations and any others be included in the proposal (be it as a link to previous discussions if any), so we can all see it from a common ground? Currently the only use case exposed by the proposal reads
This is a fairly common case indeed, and it would be great to solve this. I don't really see how the last two goals got derived from this use case. Now if this proposal must really encompass these three goals together, I'd like to understand better how this would work by means of examples. I guess that indeed we will need to develop an entirely new way of writing Worker scripts so that they can handle new part of scripts being inserted at any time. Failing to see the model(s) that could be used to handle this, I'm not sure how the threadpool idea would benefit from this proposal. |
Sure. I'll take an action item to update the explainer with more use cases. I am catching up after being out of office, though, so there might be a delay. Overall, though, I think its around providing a mechanism that can be used kind of like GCD. In particular, this combines well with the js blocks proposal. I'll also try to better explain some of my concerns with making the constructor script URL cross origin. There are browser architectural and security concerns beyond API shape confusion. Finally, maybe we could compromise here and offer a static convenience method which combines the two steps in addition to the current proposal. Something like |
(I know I’m bikeshedding ahead of time here, but in my opinion, something like |
This is not really bikeshedding though. In one case you only define a new way of starting a new Worker from any script, while in the other case you are creating a whole new paradigm where scripts can be injected from outside without the inner script having control over it. As I understand it, bikeshedding over new Worker("about:blank", { // or null or whatever
type: "module",
initialScript: "https://cdn.foo.com/my-worker-script.com" // or any other property name that makes sense
}) By the way, I'd like to challenge a bit the idea that these An other scenario could imply a SharedWorker, where each client is assumed to be unable to see what the other clients are doing with this SharedWorker. Once again, this Sure writing sensitive code based on this assumption was a bad call if anyone did so, but I think this needs at least some consideration. |
I want to re-emphasize @domenic’s point about reusing workers. The main motivation that has been discussed so far is that it is annoyingly complicated to create a worker with a source file that’s being served from a CDN. The most common workaround I see is people using data URLs or Blob URLs combined with function crossOriginWorker(crossOriginSrc) {
return new Worker(
URL.createObjectURL(
new Blob([
`importScripts("${src}");` // who needs sanitization
], {type: "text/javascript"})
)
);
} However, another big problem is that Workers are hard to re-use. Instead, I see people creating a new worker instead, which is bad. Even if they are performance aware enough to I think this proposal is quite elegant in that it solves both problems with a minimal addition to the Worker API, while the new API parts already have precedent on the platform (see Worklets) and don’t rely on changing anything potentially related to origins and security. This API addition would be a great boon for a scheduler-like primitive (even in user land code) where the individual worker is abstracted away and the scheduler re-uses workers from a pool to run scripts. To give my 2c on some question: I don’t think there’s cause for worry about worker code expecting to be in sole control of the Worker. Libraries that use workers seamlessly under the hood are not affected by this proposal (they are in control of creating the worker and no other code will get added). Libraries that expect to be @Kaiido I think any code that stored secrets on a global is already broken. Even without this proposal, you can’t know whether your code is in sole control of the worker global or whether it is just one of many scripts imported via |
Thanks for restating these points, but I don't see the answers to most of the questions in #6911 (comment). I still don't see how this proposal would really help that threadpool idea. But once again, it's also possible that I may just be missing completely how all this should work from the Worker script point of view, and thus I'm still hoping for clear examples of usage. Regarding the point of breaking the assumption that Worker's contexts are isolated, when today you write For what it's worth, I built a simple demo to try to convince myself of the potential benefits (or risks I must admit) of this proposal, available at https://glitch.com/edit/#!/worker-external-importscripts and the result is that I am still not really convinced. |
Yeah so I think this is the main problem with just adding One way could to be do the sort've thing comlink does with a wrapper thing around the imported module (I don't know what an equivalent for const worker = new Worker();
const workerModule = await worker.addModule(module {
export function bigSum(count) {
let sum = 0;
for (let i = 0; i < count; i++) {
sum += i;
}
return sum;
}
});
// Module exported functions are wrapped with a message channel
const sum = await workerModule.bigSum(1000); Alternatively we could have a way of doing this more explictly by passing an explicit message channel somehow, this is a bit lower level but would allow something like the above to be built on top of this: const worker = new Worker();
const channel = new MessageChannel()
await worker.addModule(
"default", // Module member to call
channel, // Data to pass to module export
module {
export default function start(messageChannel) {
messageChannel.addEventListener("message", ({ data }) => {
// Process on our own dedicated channel
});
}
},
); An inbetween idea could be to give a remote handle kind design (this is heavily inspired by Puppeteer/Playwright's API for working with objects through the debugger channel) i.e.: const worker = new Worker();
const moduleHandle = await worker.addModule(module {
export function bigSum(count) {
// ...
}
export const x = 3;
});
const bigSumHandle = await moduleHandle.get("bigSum");
const result = await bigSumHandle.apply([10000]); I have no idea how any of these ideas would work with |
Strongly agree with @Jamesernator (I actually was going to quote the exact same two paragraphs from @Kaiido and @RReverser), and was going to mention that to improve web worker ergonomics and developer uptake of best practices, I think we really need to look at the popular libraries/frameworks that people are building around them (like comlink). For me, the current friction around reusing workers for multiple modules comes almost entirely from setting up all the extra const worker = new Worker();
const workerModule = await worker.addModule(module { export function bigSum() { ... } });
const sum = await workerModule.bigSum(1000); would be a wonderful improvement in the ergonomics of using workers, and would make it trivially easy get expensive operations off the main thread and reuse the same worker for multiple modules. |
Yes I agree with #6911 (comment) and #6911 (comment), this would be very useful, it lets even dream of a Not so useful for the initial use-case of starting a Worker from a CDN hosted script though. |
I would really love to see this idea implemented in browsers. There are certainly workarounds for a given website, but it's basically impossible to maintain a library that uses web workers without spending countless hours on testing against actual browsers and bundlers. We've had to file a lot of bugs and switch bundlers over this very issue. The proposal here introduces a situation where the semantics are clear, and where I would hope that a new function like I see one challenge here that would be addressed by an alternative like module blocks: passing in classes or modules at runtime. This is particular valuable for two reasons:
I would love if that could be addressed here as well, although unfortunately I don't see an ergonomic way that isn't basically equivalent to module blocks. (Although |
With the release of Firefox 114 next Tuesday, all major browsers will support module workers. I'm personally really looking forward to this, as it allows me to remove a lot workarounds. However, I still have to manage a large stack of remaining workarounds so that our code can be used from Due to the need to use a
Footnotes
|
I wonder who would be best to ping for a status update here? Maybe it's kinda blocked on some other, related proposal like module expressions and/or module declarations? Hesitantly pinging @nicolo-ribaudo in case you can provide any perspective/info here 🙏 Just kind of desperately hoping we can get to something like this eventually (discussed earlier in this thread): const worker = new Worker();
const workerModule = await worker.addModule(module { export function bigSum() { ... } });
const sum = await workerModule.bigSum(1000); |
This is actually just blocked on implementation/spec/tests work. If you're able to contribute, please do! |
I do worry about the race we would be introducing for |
As suggest at: whatwg/html#6911 (comment)
Sounds like a deal. In case it gets things out the door: https://github.com/lgarron/worker-execution-origin Or if it's preferable to implement the full blank |
Hmm. The alternate proposal is interesting. I'm unsure whether it meets the various constraints people had in mind here. It certainly is less powerful; it doesn't solve the use cases I'm mildly passionate about, around allowing multiple scripts to be run in the same worker. (As such, it wouldn't help address my concerns about module blocks + worker integration.) But I'm unsure if we have anyone who would object, if you were to implement that alternative proposal in browsers yourself. BTW, in case you weren't aware, we're looking for tests in the web platform tests format. |
A major advantage of the blank worker approach over the https://github.com/lgarron/worker-execution-origin proposal is that we sidestep many of the issues raised in #9571. I feel that it's significantly easier to reason about what's going on with the explicit semantics the explainer in the first comment currently has of any loads explicitly not being top-level loads:
In general, I would also echo @domenic's desire to support multiple scripts in the same worker. We are addressing some technical debt in the Firefox Workers implementation and something that is majorly clear is that it is only possible to GC workers in the most excessively trivial cases. So it is desirable to encourage the threadpool idiom @domenic describes rather than favoring an idiom where many one-off workers are created because it sidesteps a variety of pathological resource leak scenarios. Also, from an implementation perspective, I think for Firefox we are much more likely to be able to implement the blank worker proposal in a timely fashion. The worker execution origin proposal would be significantly more scary to implement because it would challenge several existing assumed invariants in particularly hairy code. |
The way I read https://github.com/lgarron/worker-execution-origin#proposal-details the passed in URL is fetched as a subresource of the newly created worker, so I think it's fairly equivalent to the blank worker proposal. But you're right that it doesn't encourage reuse of the worker global. Side note: In addition to no |
Indeed, my goal was to describe something that should be no more controversial or difficult to ship than blank
For what it's worth, this works for me. Modern code bases use relative URLs to refer to related files, and it's pretty much impossible to write portable |
Thanks! I've never written one before. Do you think https://github.com/web-platform-tests/wpt/blob/cd2e11b07bc04f02366ab93e5df41bf3cfc5cf95/resource-timing/cross-origin-iframe.html and https://github.com/web-platform-tests/wpt/blob/cd2e11b07bc04f02366ab93e5df41bf3cfc5cf95/xhr/open-url-worker-origin.htm would would be good tests to start from, or should I be starting somewhere more basic? |
That one has a bit of extra stuff to handle things specific to that API, so here's a simpler sample: https://github.com/web-platform-tests/wpt/blob/26d5ce16a6f5e51429de12f25d3c011c4554ee32/html/browsers/history/the-history-interface/pushstate-replacestate-empty-string/pushstate-base.html . In general, https://web-platform-tests.org/writing-tests/testharness.html might be a good starting point. |
ShadowRealm is now stage 3, and I think it has ideas we could borrow. const worker = new Worker('about:blankjs', { type: 'module' });
// Import into worker:
await worker.importValue(specifier);
// Import and get export:
const value = await worker.importValue(specifier, exportName);
The export can be anything structured cloneable, but can be or can include functions. When functions are called, the args are cloned and the function in the worker is called with those args. The return value is cloned, and used to resolve the function on the caller's side. For example: worker-utils.js export createNumbersArray(length) {
return Array.from({ length }, (_, i) => i);
} index.js const worker = new Worker('about:blankjs', { type: 'module' });
const createNumbersArray = await worker.importValue('./worker-utils.js', 'createNumbersArray');
const numbersArray = await createNumbersArray(3);
// [0, 1, 2] |
One caveat that should probably be considered is how to pass transferable objects (e.g. message ports, offscreen canvases) through such functions. Two possible approaches: Auto transferCertainly the most convenient, especially for arbitrary modules that aren't aware they're in a worker context. This would look like: export async function createRenderer(offscreenCanvas) {
const ctx = offscreenCanvas.getContext("webgpu");
// ...setup context etc
const { port1, port2 } = new MessageChannel();
port1.onmessage = ({ data }) => {
// render the frame somehow
}
// Auto transfered
return port2;
} const createRenderer = await worker.importValue("./renderer.js", "createRenderer");
// Auto transfers the offscreen canvas
const framePort = await createRenderer(someCanvasElement.transferControlToOffscreen());
function renderFrame() {
const data = getInfoFromDOMSomehow();
framePort.postMessage(data);
} ExplicitMore like existing APIs, but means we need to provide some additional methods to actually provide the transfer list. However within the worker this is a bit of a footgun as we need to expose the Usage might look like something like: export async function createRenderer(offscreenCanvas) {
// ...same as previous example...
// Explicit transfer with value and transferList, this might be a footgun though
// for people expecting just plain returns of values to work
return { value: port2, transferList: [port2] };
// ALTERNATIVE:
// Provide the transfer list on the `this` value
this.transferList.push(port2);
return port2;
} const createRenderer = await worker.importValue("./renderer.js", "createRenderer");
const offscreenCanvas = someCanvasElement.transferControlToOffscreen();
const framePort = await createRenderer
.callWithTransfer(offscreenCanvas, [offscreenCanvas]); |
I think this idea of providing more ergonomic transfer/clone operations for workers is separate from the blank worker proposal. So let's move it to another thread, if people want to continue discussing. |
Moved to #10078 |
Blank Worker Explainer
Introduction
The web platform currently requires
DedicatedWorker
andSharedWorker
scripts to be same-origin to the parent context creating them. This is largely motivated by the desire to avoid some of the issues associated with the creation of cross-origin iframes.This restriction, however, creates a common headache for web developers. They often have scripts hosted on cross-origin CDNs. They cannot directly use these scripts to create a
DedicatedWorker
orSharedWorker
. Instead they must use a workaround like:This works, but it is a persistent paper cut for web developers. It makes something that should be easy, complicated and non-obvious. It also risks leaking the blob URL if the code does not later call
revokeObjectURL()
. It also invokes a lot of complicated machinery in the browser to persist and load the blob. This overhead should not be necessary.This effort proposes to improve the situation by providing two features that are available in iframes, but missing in
DedicatedWorker
andSharedWorker
today:With this proposal to provide these features, the example above could instead be written:
Goals
DedicatedWorker
andSharedWorker
threads using cross-origin scripts.Non-Goals
DedicatedWorker
orSharedWorker
contexts where theirself.origin
differs from their owner'sself.origin
.Web APIs
This proposal includes two distinct API changes. In theory these are somewhat orthogonal, but we need both to address the motivating use case.
Blank Worker Construction
This API change simply provides a default constructor that has no script URL argument. So:
Workers constructed in this way have a script URL of
about:blankjs
. The origin, policy container, service worker controller, etc of the owner are inherited by the worker context just as a childabout:blank
iframe inherits them from its parent. Theabout:blankjs
resource will be considered to have antext/javascript
mime type whileabout:blank
has atext/html
mime type.Owner Initiated Script Execution
This API change proposes to allow the owning context to initiate script execution in the worker context.
This API could also support running modules:
Alternatively we could instead expose a single
w.executeScripts(url, { type })
method.These methods would act as if they sent a
postMessage()
to the worker which then invokedimportScripts()
oraddModule()
in the worker context. It would thenpostMessage()
back to the owning context, indicating that the script execution was completed. This would then resolve the promise returned fromw.executeScripts()
.Notably, this
postMessage()
-like behavior means that multiple calls to executeScript() would be queued. Modules that use top-level await could interleave, but otherwise all scripts would run in the order they were sent.Considered Alternatives
The main alternative that is typically suggested is to simply allow
new Worker()
andnew SharedWorker()
to take cross-origin scripts. We don't want to do this for a couple of reasons.First, we don't want to support cross-origin workers at the moment. We are still dealing with the long tail of consequences of allowing cross-origin iframes. If necessary, code can construct a cross-origin iframe which can then create its own worker.
Second, we don't want to support cross-origin scripts while keeping the worker same-origin to its owner because it would create a very exceptional loading situation. Today all contexts and javascript globals have an origin that matches the origin of their loading resource. Breaking this constraint would create an exceptional case in the browser which could lead to unexpected security issues.
Privacy & Security Considerations
This proposal does not store any user data or expose any information about the client to the server. It's mainly an ergonomic API change for something that is already achievable through the blob API. There should not be any privacy impact from this proposal.
In terms of security, however, there may be a few items to discuss.
First, it may be controversial to create a new special URL type like
about:blankjs
. One could argue we should instead useabout:blank
itself. That would be problematic, however, sinceabout:blank
has atext/html
mime type. In addition,about:blank
has numerous weird behaviors (initial about:blank, replacement, fragments, etc) that will not be supported inabout:blankjs
. We do not want to propagate these unusual features to workers and it would be another weirdness forabout:blank
to work inconsistently.Second, it is possibly concerning that the owner can inject script into the worker at any time. This would be a new capability that existing scripts may not be expecting. We argue, however, that the owner/worker division is not a security boundary. The owner and worker already share storage, network cache, service workers, etc. There are many ways for the owner to attack the worker context if it wanted to.
In addition, it seems likely an owner could use blob URLs to construct the same behavior we are proposing here to inject script whenever it wants into a target worker thread, by executing a blob URL containing a script execution framework plus
importScripts(originalURL)
, instead of by usingnew Worker(originalURL)
directly. Same-origin scripts can potentially defend against this CSP, but again there are many other ways for the owner to attack the worker script via poisoned storage, cache, etc.Acknowledgements
Thank you to @domenic and @surma for reviewing and contributing to this explainer.
The text was updated successfully, but these errors were encountered: