Implementation help #2

jeremy-coleman · 2021-03-12T17:01:06Z

Hi, i have been working on a fork of browserify and have been wanting to implement a worker pool for transforms using something like this so that it can keep up with esbuild. here is a repo with the bundler in tools/omnify/bundler. The speed increase (compared to browserify) is mostly coming from replacing acorn/detective with regex. https://github.com/jeremy-coleman/esbuild-vs-omnify-r3f . I think this would be a really good dogfood project for this lib and node streams in general. If you think it’d be interesting and would like to brainstorm , lmk.

mcollina · 2021-03-12T18:28:34Z

This library is still in its infancy. It's probably already usable, but there are likely a lot of bugs. Using it to parallelize the transforms would be pretty amazing.

Note that this code is really one-way, it does not implement a "passthrough" context. Doing this would be amazing.

I probably do not have a lot of time to brainstorm right now... I'm focusing on shipping the next pino release with this in it!

jeremy-coleman · 2021-03-12T18:54:54Z

I'll just leave my thoughts so far, maybe it will help plant a thought seed 🌱
i sketched out my original thoughts in this slide, comparing how a single threaded process vs a mutli-threaded one would work. I was originally thinking of using cluster, but later concluded otherwise (explanation why below the slide)

after some more thought, a few changes to the first layout. I think forks should send back(or send back an ack and forward to a write stream) their work every round instead of holding onto it to aggregate at the end - I came to this conclusion for a few reasons. Firstly and most intuitively, the total amount of work of sending finished work will be the same, so its better to chunk it if possible - the original reason for forks to hold onto finished work was that spamming relatively large messages would have overhead, which it may, but you'll have to pay that somewhere no matter what. Secondly, if forks were to hold their finished work, they could theoretically grow beyond memory limits. Thirdly, I originally thought forks should do their own file io, but it is probably better for the master to do it and transfer to workers threads - which should mean no need for cluster all together. I was originally thinking of a system that would be limited to message passing, but moving file io to master and transferring the memory is possible , so it should probably do that instead.
The third point is uncertain to me. Should the master do all the file IO? Or let workers read their own files. Remember browserify walks deps incrementally, so you won’t do a glob on directory and start with all known files like gulp or a typescript project. This is why workers have to send back the deps they found to the master – so it knows what path to pass on next. The worker can also send back the deps found BEFORE it starts any transform, because the deps are found via a fast regexp search without any AST creation.
Something like..

worker gets file however.
Worker parses deps
Concurrently..
Worker sends deps back to master (just an array of strings)
Master starts doing things (reading files, writing results, resolving full paths of deps)
Worker runs transforms
Worker does what with the transformed file? It could send it back to master, just hold onto it to aggregate later, or send it a write stream somewhere.

I think however this gets handled, in addition to the resolve algorithm, will be the two most important things that impact overall speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation help #2

Implementation help #2

jeremy-coleman commented Mar 12, 2021

mcollina commented Mar 12, 2021

jeremy-coleman commented Mar 12, 2021 •

edited

Loading

Implementation help #2

Implementation help #2

Comments

jeremy-coleman commented Mar 12, 2021

mcollina commented Mar 12, 2021

jeremy-coleman commented Mar 12, 2021 • edited Loading

jeremy-coleman commented Mar 12, 2021 •

edited

Loading