Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple cores/processors #317

Closed
alexwhitman opened this issue Mar 1, 2014 · 43 comments
Closed

Support for multiple cores/processors #317

alexwhitman opened this issue Mar 1, 2014 · 43 comments

Comments

@alexwhitman
Copy link

As tasks are run concurrently it would be useful if gulp could use mutliple cores/processors to speed up large builds. This could potentially be done using the cluster api.

@yocontra
Copy link
Member

yocontra commented Mar 3, 2014

Streams don't work this way but I'm leaving this open because I think it would be cool to make a version of through2 that spun up child_processes or used something like threads-a-go-go

@sindresorhus
Copy link
Contributor

That would be rad. There's also: https://github.com/audreyt/node-webworker-threads

The problem with spinning up child processes is that it comes with some overhead, usually 100-500ms startup cost.

@yocontra
Copy link
Member

yocontra commented Mar 3, 2014

@sindresorhus 👍

@terinjokes
Copy link
Contributor

In my previous work here, I've found that sending and receiving data from child processes was noticeably slower. I used a node C++ extension that gave the ability to spin up libuv threads that worked fine, but you lose access to a lot of "node" in that case.

@yocontra
Copy link
Member

yocontra commented Mar 3, 2014

We should benchmark webworker-threads vs. libuv threads vs. threads-a-go-go vs. child_processes and see which would be best for our case.

@sindresorhus
Copy link
Contributor

👍

This is where it would be useful if Node had full support for Isolates. Maybe in the future.

@sindresorhus
Copy link
Contributor

@mako-taco
Copy link

@contra I am struggling to see a way to do this with TAGG or node-webworkers, because afaik you cannot require modules inside of the threads. What are your thoughts on how to accomplish this?

@heikki
Copy link
Contributor

heikki commented Jan 11, 2015

I did some experiments with child processes. https://github.com/heikki/spawn-task-experiment

Running self contained tasks in child processes is surprisingly fast. Test task takes all the js files from node_modules, concats them and creates sourcemaps.

Two normal tasks run parallel:

∴ spawn-task-experiment git:(master) ./node_modules/.bin/gulp normal
[22:09:16] Using gulpfile ~/Desktop/spawn-task-experiment/gulpfile.js
[22:09:16] Starting 'normal'...
[22:09:16] Starting 'clean'...
[22:09:16] Finished 'clean' after 9.08 ms
[22:09:16] Starting 'parallel'...
[22:09:16] Starting 'normal-task'...
[22:09:16] Starting 'normal-task'...
[22:09:45] Finished 'normal-task' after 29 s
[22:09:45] Finished 'normal-task' after 29 s
[22:09:45] Finished 'parallel' after 29 s
[22:09:45] Finished 'normal' after 29 s

Two spawned tasks run parallel:

∴ spawn-task-experiment git:(master) ./node_modules/.bin/gulp spawn
[22:09:51] Using gulpfile ~/Desktop/spawn-task-experiment/gulpfile.js
[22:09:51] Starting 'spawn'...
[22:09:51] Starting 'clean'...
[22:09:51] Finished 'clean' after 9.29 ms
[22:09:51] Starting 'parallel'...
[22:09:51] Starting 'spawn-task'...
[22:09:51] Starting 'spawn-task'...
[22:10:07] Finished 'spawn-task' after 16 s
[22:10:07] Finished 'spawn-task' after 16 s
[22:10:07] Finished 'parallel' after 16 s
[22:10:07] Finished 'spawn' after 16 s

Has anyone else explored this stuff?

--edit

Code looks too simple and results too good. Please point out any flaws 😸

@heikki
Copy link
Contributor

heikki commented Jan 12, 2015

Ping @noahgrant ^

@yocontra
Copy link
Member

@heikki Try using worker pools instead of spinning a new one up each time you run the task. Startup times on new child_processes can be like 30ms, using a pool eliminates that

@heikki
Copy link
Contributor

heikki commented Jan 12, 2015

I tried that already but decided to show simpler way first. Leaving child processes alive have a side effect that parent doesn't exit. So maybe use worker pool only when running watch.

--edit

Exit problem happened in different setup where messaging was done via ipc channel. If child's ipc channel was left open then the parent didn't exit. I didn't find a way to reopen it after closing so it was a dead end.

@heikki
Copy link
Contributor

heikki commented Jan 12, 2015

Added worker pool example using worker-farm.

@yocontra
Copy link
Member

@heikki Is there a module yet for just wrapping a function in a child_process pool? If not you should publish one, and we can play around with just

var makeItFaster = require('your-module');

gulp.task('whatever', makeItFaster(function(){

}));

before thinking about adding anything to core

@heikki
Copy link
Contributor

heikki commented Jan 13, 2015

Published with the same name -> spawn-task-experiment

@yocontra
Copy link
Member

@heikki Cool stuff - I'll let people play around with it for a bit and you can build off their feedback 👍

@insidewhy
Copy link

Implement support for multiple processes in the sigh asset pipeline and wrote a cool library for making dealing with process pools easy using a promise based API.

@AlekseyMartynov
Copy link

Synthetic example of paralleling dependent tasks with worker-farm:
https://github.com/AlekseyMartynov/misc/tree/master/gulp-with-worker-farm

@csvan
Copy link

csvan commented Jul 17, 2015

+1

@yocontra
Copy link
Member

Is anyone using any of these solutions in their gulpfiles yet? Curious what people have working so far

@bcherny
Copy link

bcherny commented Oct 19, 2015

any progress on making this this default behavior? one gulp's biggest benefits is making parallel task execution the default, and it would be nice to run with this idea to make gulp even faster!

@phated
Copy link
Member

phated commented Oct 19, 2015

@bcherny no one is working on this for gulp core. Please notice the "probably user land" label.

@dbkaplun
Copy link

I agree this should be implemented in userland, but I also think this should be implemented!

@bcherny
Copy link

bcherny commented Oct 19, 2015

what's the rationale for keeping this out of gulp-core? gulp is responsible for executing tasks in parallel, and it makes assumptions about how exactly to do that (in a single process, on a single core). why would it do it that way by default, rather than on multiple threads by default?

@phated
Copy link
Member

phated commented Oct 19, 2015

@bcherny in addition to my comment in #1308 (comment):

  • forking is quite a bit of overhead, so doing it by default is actually worse performance in the common case
  • it adds a ton of maintenance and debugging overhead
  • that's how node works

Again, do it by hand or in userland

@bcherny
Copy link

bcherny commented Oct 19, 2015

thanks for the explanation! it's a good argument. i've seen at minimum 20-30s build for any large app i've worked on, so time to spool up a new thread is negligible for my use case. i bump into this problem often, and am surprised that there are so few existing solutions.

@yocontra
Copy link
Member

@bcherny Just to be clear: we did not make a choice to run things in a single thread. This is how node and javascript works. It is a constraint of the language and environment, not a choice that we made.

@bcherny
Copy link

bcherny commented Oct 20, 2015

@contra not sure i understand. node offers a few apis to help orchestrate async tasks. among them are promises and setTimeout, but also child_process. i'm not sure that any one of those is the way "javascript works".

@yocontra
Copy link
Member

@bcherny promises and setTimeout are still running on the same thread. I'm saying this: in JS all work happens on the main thread, gulp is a javascript library, therefor gulp has one thread. Your earlier posts made it seem like I made a choice that gulp should only be one thread. Just trying to clear it up for any future people who look at this post so nobody gets confused about where the limitation is from.

@bcherny
Copy link

bcherny commented Oct 20, 2015

this is a technicality, but to rephrase: node offers a bunch of apis for dealing with async tasks. among them are apis that run on a single thread, and apis that run on multiple threads.

javascript itself runs on a single thread, but both node and the browser offer async apis that work in multi-thread contexts.

gulp uses async apis to orchestrate async tasks. specifically, it uses orchestrator, which uses callbacks, promises and streams. the latter 2 are async apis that happen to run in a single thread. orchestrator could just as well have used child_process.

node is not biased toward promises, streams, or child_process.exec. they are all just node apis, and orchestrator made the choice to use only those apis that run in a single thread.

@yocontra
Copy link
Member

@bcherny The only API that node provides to solve this problem is child_process, which is not a lightweight thread - it is a full node processes running completely separate.

No, orchestrator (which we don't use anymore btw) could not have "just as well have used child_process":

  • You would have to write your JS in a really fucked up way since every function is completely isolated in it's own node process (no shared requires, no shared memory whatsoever)
  • Spinning up a child_process takes 30ms
  • Spinning up a child_process uses a ton of memory since it is loading a new instance of node + v8 each time

Running every function in a child_process would be a horrible default - the idea that gulp is idiomatic javascript would go out the window since it requires people to write code fundamentally different than they normally do.

tl;dr Running every task in a child_process is never going to happen in core since child_process is not the right abstraction (reasons listed above) - if you understand the tradeoffs and want to run your task in a child_process, use this tiny module https://github.com/heikki/spawn-task-experiment

This would have all been solved if node had finished the Isolate API (essentially lightweight threads w/ shared V8 memory) but they abandoned it in 0.9.x

@yocontra
Copy link
Member

I'm going to close this since I think we've reached a solution (#317 (comment))

If node ever revisits the Isolate API or somebody makes a lightweight threading module that supports shared requires I will reopen this as a possibility, until then it will have to stay in userland.

@sindresorhus
Copy link
Contributor

Spinning up a child_process takes 30ms

Usually more if you have a lot of heavy requires, and even slower on Windows.

Support for workers (lightweight processes backed by OS threads) might land in Node.js 5. That would be a better way to deal with it. nodejs/node#2133

@zhoujianlin8
Copy link

I have try to use cluster in gulp-ctool-browserify but it don't make it faster. it seems limit by memory and cpu
who can help me

@inikulin
Copy link

Hi guys.
I've made gulp plugin for the multiprocessing. In general it's faster than regular builds, but it's requires some manual tuning and may not give performance gain at all in some projects because of reasons described by @contra

@kevin-smets
Copy link

@inikulin, thanks for that plugin, works like a charm! Using it now in a massive project, which brought the total build time down from 3 minutes to 2 minutes by running jade, sass and coffee compilation as parallel tasks (with gulp-ll obviously).

Currently my Jade is taking the longest at 2 minutes, Splitting this into jade / jade1 / jade2 tasks with each their own blob brought the total build down to 50 seconds... (we're coming from 5m30s total with a previous setup, so that's just awesome). But that's a "living" config of blobs which is not very maintainable.

Plus I use UUID generators in my Jade mixins, which now, obviously are not guaranteed to be unique anymore, because these parallel tasks do not know of one another. Splitting tasks into workers is easy peasy using gulp-ll, but splitting a single task into multiple processes, haven't found a solution for that yet... Unless someone has any pointers for that? I'm guessing spawning child processes inside a single task will pretty much be the only way?

Currently the build uses 3 of 8 cores (jade takes one, the others take one physical core + one virtual), there is still so much untapped potential.

@cghamburg
Copy link

@inikulin
This works great. Just by running stylus and es6 compile tasks in parallel using gulp-ll we went down from 1m42s to 47s!

import ll from 'gulp-ll';
ll.tasks(['css', 'js']);
gulp.task('default', done => {
    runSequence(['css', 'js'],'bundle');
});

@kevincaradant
Copy link

Is it possible to do a multithread on one task like uglify JS( so the same pipe but in parallel to realize this uglify ) without only one core at 100% ?
With gulp 4.0, i can do on a multithread on two different tasks , with one task on sass files and an other on JS files.( uses 2 cores )
After i see your plugin @inikulin , that seems to be great , but i didn't found anything about the same task. Maybe i ask too much :/
i'm a little bit confused about this ...

@vangorra
Copy link

vangorra commented Oct 5, 2016

Nobody seems to have mentioned: https://www.npmjs.com/package/gulp-multi-process

We're using it to parallelize the 4 separate build tasks using webpack, typescript, babel and uglify. Works great.

@yocontra
Copy link
Member

yocontra commented Oct 5, 2016

@vangorra Seeing how webpack is 100% synchronous and blocks the event loop for ~30 seconds sometimes, that sounds super useful. Thanks for the link.

@strarsis
Copy link

It would be also great if a stream (of vinyl files) could be distributed over a process with
a particular node task on multiple cores.
gulp-multi-process is great but it can be only used to distribute on task-level.

@xkr47
Copy link

xkr47 commented Feb 27, 2017

@strarsis I'm not exactly sure if this is what you asked for but.. I worked on a vinyl-parallel transform that transports vinyls (including content) to a different process, does whatever processing needed there and then transports the result back. See below urls for example; the example gulpfile contains tasks demontrating traditional use (tasks named sync-*) and use together vinyl-parallel (tasks named parallel-*). In the latter case the parallel work to be done is implemented in the gulpslave.js file.

The project is mostly finished but I ended up personally using gulp-ll instead in our project because we happened to have enough tasks that we could get decent built times using that. Feel free to report any bugs/improvement ideas you find.

https://github.com/NitorCreations/vinyl-parallel/blob/master/example/gulpfile.js
https://github.com/NitorCreations/vinyl-parallel/blob/master/example/gulpslave.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests