Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: migrate to ESM #12

Draft
wants to merge 7 commits into
base: staging
Choose a base branch
from
Draft

feat: migrate to ESM #12

wants to merge 7 commits into from

Conversation

CMCDragonkai
Copy link
Member

@CMCDragonkai CMCDragonkai commented Aug 13, 2023

Description

Migrating to ESM.

We're switching to using the internal node implementation for worker threads. This will also address #16 .

Issues Fixed

Tasks

  • 1. cut out threads and replace it with node:worker_threads.
  • 2. Implement a simple WorkerPool for managing workers and scheduling tasks.
  • 3. Implement utilities for enforcing types on calling worker functions and setting up a worker.
  • 4. Try and work out a way to fully inline a worker script when starting a worker without having to use the scripts file path.
  • 5. Complete conversion to ESM

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@ghost
Copy link

ghost commented Aug 13, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@CMCDragonkai
Copy link
Member Author

Not sure if we need to fork this: andywer/threads.js#470

@CMCDragonkai CMCDragonkai self-assigned this Aug 13, 2023
@CMCDragonkai
Copy link
Member Author

Might be able to use:

Use a Package Alias: Some package managers allow aliasing a package to a local directory or version. You could then modify the local copy to your needs.

To bypass it.

Now I think this would require a special "imports" alias and combined with tsconfig paths to hack around the incorrect "exports" key. Better than forking and maintaining it. Although if it could be done, that would be great.

@CMCDragonkai
Copy link
Member Author

Actually I think I need to do a git submodule. That would be the easiest.

@CMCDragonkai
Copy link
Member Author

Ok so trying to use a git submodule can be complicated due to the lack of dependencies being acquired under src/threads.js. And potentially requiring a different set of compilation tools.

Going back to attempting with an import path.

@CMCDragonkai
Copy link
Member Author

Actually even subpath imports does not work because the node_modules wouldn't exist at the relevant location.

The only solution now is to either entirely fork the project or just provide overrides on the types, meaning we type out what ModuleMethods is likely to be.

The problem is none of the types work anymore.

  • QueuedTask
  • ModuleThread
  • ModuleMethods

Because of errors in how threads.js exposes the types.

So we have to define all these types.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Aug 14, 2023

Ok after setting the overriding the types to any, we still end up with a problem. Attempting to create a worker from a file written in TS requires ts-node. So threads.js is basically needing ts-node to execute the file.

type ModuleMethods = {
  [methodName: string]: (...args: any) => any;
};
type ModuleThread<Methods = any> = any;
type QueuedTask<ThreadType, Return> = any;

export type {
  ModuleMethods,
  ModuleThread,
  QueuedTask
};

Of course if I change to just using regular js, it can load the .js file without any transpilation thus avoiding the ts-node requirement.

But then it's not possible to do import threads from 'threads';... probably because it's now running them like CJS code? I'm not entirely sure.

import * as threads from 'threads';

const { Transfer, isWorkerRuntime } = threads;

Is necessary to actually get the constructs necessary, but the workers no longer have any corresponding types.

I think though one could use annotations.

But generally speaking, it's just not a good idea to use typescript based workers atm since we shouldn't be tied to ts-node anyway (even if it is only during development), because after compilation it would all be JS files anyway.

For some reason isWorkerRuntime no longer exists either.


All in all, I don't think as of now js-workers can be converted to ESM simply because threads.js is just not properly exporting its things. And needing to convert to using .ts workers not great either, although that is necessity on ts-node.

I might have to be forced to keep js-workers as CJS, and just import CJS to ESM by doing the trick by importing the default, then pattern matching out of it.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Aug 14, 2023

Going to try keeping js-workers as CJS, and long term wise look into removing threadsjs and favour of something in our flavour.

Can see https://github.com/piscinajs/piscina for inspiration.

@CMCDragonkai CMCDragonkai mentioned this pull request Aug 14, 2023
9 tasks
@CMCDragonkai
Copy link
Member Author

I think we just remove browser support for the moment, and focus on nodejs worker threads, similar to our project in js-ws, and then slowly add back in webworker (browser support) afterwards. This can radically simplify this project and give us ESM support too. This could be assigned to @addievo.

@CMCDragonkai
Copy link
Member Author

Going over the https://nodejs.org/api/worker_threads.html shows that the worker threads implementation will be quite complex. Here's a brief overview of things that need to be considered:

  1. How MessagePort works - this is basically the communication mechanism between the parent thread and all the worker threads. You have to use to communicate what functions you want to execute, as well as all the results of execution. Remember that the worker threads are like mini-servers, receiving messages asynchronously and handling them. Because execution is potentially asynchronous, you also have to asynchronously manage the results and to send back the results. You have message passing API between the main thread and worker threads.
  2. The creation of a worker involves using the new Worker that is provided by node:worker_threads. This call creates a thread with an existing nodejs runtime. Worker threads are real threads so they do share memory, but access is transferred either by copying or ownership. There's also a SharedArrayBuffer which is really mutable multithreaded buffer, but this no longer easily used in browsers anyway, so transferrable arraybuffers is easier to work with. (Note that in the case of js-quic, if we were using node threads, shared array buffers would work, or we would at the very least need to be able to transfer to a worker and transfer back out).
  3. The code of a worker thread is ESM based with ESM nodejs. So you are passing a file path or a URL, and it's possible that node understands the file path to be ESM native, or understands the URL to actually embed the worker code. There's no native support for TS, any TS should be precompiled to JS, but this does impact the new Worker() file path, which might need to load the .js version. It's possible to use some interfaces types to expose typesafe functionality.
    image
  4. We should be able to take advantage of the latest nodejs capabilities... but also have the common denominator with WebWorker.
  5. There's also a broadcast system that can enable one to many communication.
  6. There may need to be asynchronous initialisation on the worker threads. Generally they can start immediately receiving messages on the message port, however we may need to do any async setup in the worker first. One could imagine a "worker" script hooks like how threads.js has done it, and enable the ability to pass in some async setup code that needs to be done.
  7. Since worker threads are just nodejs runtimes, you can just run arbitrary code, but it is easier to understand how to do this if instead the workers exposes a flat record of function calls to call. The problem with allowing arbitrary function calls is the problem of serialising closures, and this is not a solved problem atm, so instead of trying to do this (I know this was complicated in Haskell), we just say that workers must expose a fixed set of operations, and instead data can be transferred over, and you'd have to mark certain things as transferrable otherwise by default things get copied over (when serialised).
  8. There's alot of edgecases that threads.js covers right now, with webpack bundling, and even electron usage where things are bundled into an .asar file.

Point is, fixing up this worker ecosystem is extremely complicated. The threads.js code is actually complex and difficult to untangle. The fastest solution right now is for upstream to fix their type exports so we can just continue using it... Without which ESM migration won't really work for us. Unless we just switch to using piscena.

This would be significant undertaking - estimated work would have to be 2 - 4 months to build a robust worker system that abides by the rest of PK's principles (I'm comparing it to how complicated js-quic became, but it should be simpler). Will need to schedule this for later after testnet 7.

@CMCDragonkai
Copy link
Member Author

As per #16, we're not going to immediately migrate to ESM. Instead, we need to work on #16 to build out our own thread pool implementation.

This is performance sensitive. So I vote for 2 entry points - Rust/C++ and JS level.

Rust based entrypoint would be more flexible as we are moving towards all native libraries being written in Rust, and it would be easier to integrate into js-quic and js-db.

JS level entrypoint means the threadpool is also usable by the any parallel processing required by JS.

This would also mean that our threadpool doesn't abide by Web Workers. However we can follow the spec of Web Workers (in terms of API) and satisfy the interface type-wise, even if implementation wouldn't be using node's own worker threads.

This would also mean our worker threads are outside libuv threading (which is traditionally used by the IO system in NodeJS), but that's also ok. There's some limitations in that libuv threadpool anyway and it was designed for IO specifically, whereas ours should work for compute parallelism too.

Some testing would be important to understand whether js-quic should use libuv threading or integrate into this rust threadpool.

Make use of benchmarks here early in order to get continuous benching.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Feb 24, 2025

@tegefaulkes Take over this PR and update spec to target #16 and MatrixAI/TypeScript-Demo-Lib/issues/32.

@tegefaulkes
Copy link
Contributor

I've re-based on staging.

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Feb 26, 2025

After doing some prototyping with node:worker_threads I can see a clear path forward. The following need to be done to complete this PR.

  1. cut out threads and replace it with node:worker_threads.
  2. Implement a simple WorkerPool for managing workers and scheduling tasks.
  3. Implement utilities for enforcing types on calling worker functions and setting up a worker.
  4. Try and work out a way to fully inline a worker script when starting a worker without having to use the scripts file path.
  5. Complete conversion to ESM

The node implementaion of workers is pretty simple. You start a worker using new Worker(scriptPath) and the worker communicates with the main thread by listening to messagePort events. It sends data back to the main thread by using the same message port. So the workers can be pretty basic or complex since we're given a fair amount of freedom there. We don't have an equivalent worker pool provided by the worker_threads. But I've already created a simple implementation for it.

As for enforcing types on making calls to the workers. The problem here is pretty similar to the RPC handles things. We have an interface that serialises data. Across this transition we loose the type enforcement so we need to re-apply types to the returned values. I think we can apply a similar solution here by providing a worker manifest which is an object of all the functions that can be called through a worker. We can then use this manifest as the worker code by calling a expose(manifest) utility within the worker. But also apply the types to the WorkerManager by deriving them from the manifest.

@tegefaulkes tegefaulkes marked this pull request as draft February 26, 2025 02:05
@tegefaulkes
Copy link
Contributor

tegefaulkes commented Feb 26, 2025

It's possible to inline the script as a string using new Worker('script code', {eval: true});. However the usefulness of this without using a bundler is kinda lacking.

To properly enforce types I need to construct an object as typescript code and import it as a type when creating the WorkerManager. This still needs to be solved but it should be possible to properly enforce the types this way. It will only work if we can import it but also load it as raw code.

This is where the bundler comes in. We can import the types directly for type enforcement, but then we can use a raw loader to import the same code as raw code. Then provide it to the worker as an evaluated string. This should fix all out problems with bundling and import paths since everything will be imported the normal way.

For example, the worker script will look something like this.

// This is an example worker script

import type { WorkerManifest } from '#types.js';
import { expose } from './expose.js';

const worker = {
  test: async (data: void) => {
    return 'hello world!';
  },
  add: async (data: { a: number; b: number }): Promise<number> => {
    console.log(data);
    return data.a + data.b;
  },
  sub: async (data: { a: number; b: number }): Promise<number> => {
    return data.a - data.b;
  },
  fac: async (data: number): Promise<number> => {
    let acc = 1;
    for (let i = 1; i < data; i++) {
      acc = acc * i;
    }
    return acc;
  },
} satisfies WorkerManifest;

expose(worker);

export default worker;

We can then create a worker factory with the following. Assuming that this file will still be compiled the normal way for us. There may still be some things to solve here.

// Import with the rawloader
import script from 'raw-loader!./script.ts'

const workerFactory: WorkerFactory = () => new Worker(script, { eval: true });

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Replace threads with internal implementation
2 participants