Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation help #2

Open
jeremy-coleman opened this issue Mar 12, 2021 · 2 comments
Open

Implementation help #2

jeremy-coleman opened this issue Mar 12, 2021 · 2 comments

Comments

@jeremy-coleman
Copy link

Hi, i have been working on a fork of browserify and have been wanting to implement a worker pool for transforms using something like this so that it can keep up with esbuild. here is a repo with the bundler in tools/omnify/bundler. The speed increase (compared to browserify) is mostly coming from replacing acorn/detective with regex. https://github.com/jeremy-coleman/esbuild-vs-omnify-r3f . I think this would be a really good dogfood project for this lib and node streams in general. If you think it’d be interesting and would like to brainstorm , lmk.

@mcollina
Copy link
Member

This library is still in its infancy. It's probably already usable, but there are likely a lot of bugs. Using it to parallelize the transforms would be pretty amazing.

Note that this code is really one-way, it does not implement a "passthrough" context. Doing this would be amazing.

I probably do not have a lot of time to brainstorm right now... I'm focusing on shipping the next pino release with this in it!

@jeremy-coleman
Copy link
Author

jeremy-coleman commented Mar 12, 2021

I'll just leave my thoughts so far, maybe it will help plant a thought seed 🌱
i sketched out my original thoughts in this slide, comparing how a single threaded process vs a mutli-threaded one would work. I was originally thinking of using cluster, but later concluded otherwise (explanation why below the slide)

image

after some more thought, a few changes to the first layout. I think forks should send back(or send back an ack and forward to a write stream) their work every round instead of holding onto it to aggregate at the end - I came to this conclusion for a few reasons. Firstly and most intuitively, the total amount of work of sending finished work will be the same, so its better to chunk it if possible - the original reason for forks to hold onto finished work was that spamming relatively large messages would have overhead, which it may, but you'll have to pay that somewhere no matter what. Secondly, if forks were to hold their finished work, they could theoretically grow beyond memory limits. Thirdly, I originally thought forks should do their own file io, but it is probably better for the master to do it and transfer to workers threads - which should mean no need for cluster all together. I was originally thinking of a system that would be limited to message passing, but moving file io to master and transferring the memory is possible , so it should probably do that instead.
The third point is uncertain to me. Should the master do all the file IO? Or let workers read their own files. Remember browserify walks deps incrementally, so you won’t do a glob on directory and start with all known files like gulp or a typescript project. This is why workers have to send back the deps they found to the master – so it knows what path to pass on next. The worker can also send back the deps found BEFORE it starts any transform, because the deps are found via a fast regexp search without any AST creation.
Something like..

  1. worker gets file however.
  2. Worker parses deps
  3. Concurrently..
    Worker sends deps back to master (just an array of strings)
    Master starts doing things (reading files, writing results, resolving full paths of deps)
  4. Worker runs transforms
  5. Worker does what with the transformed file? It could send it back to master, just hold onto it to aggregate later, or send it a write stream somewhere.

I think however this gets handled, in addition to the resolve algorithm, will be the two most important things that impact overall speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants