-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multiple cores/processors #317
Comments
Streams don't work this way but I'm leaving this open because I think it would be cool to make a version of through2 that spun up child_processes or used something like threads-a-go-go |
That would be rad. There's also: https://github.com/audreyt/node-webworker-threads The problem with spinning up child processes is that it comes with some overhead, usually 100-500ms startup cost. |
In my previous work here, I've found that sending and receiving data from child processes was noticeably slower. I used a node C++ extension that gave the ability to spin up libuv threads that worked fine, but you lose access to a lot of "node" in that case. |
We should benchmark webworker-threads vs. libuv threads vs. threads-a-go-go vs. child_processes and see which would be best for our case. |
👍 This is where it would be useful if Node had full support for Isolates. Maybe in the future. |
@contra i know. see: and http://strongloop.com/developers/videos/#whats-new-in-nodejs-v012 (03:30) |
@contra I am struggling to see a way to do this with TAGG or node-webworkers, because afaik you cannot require modules inside of the threads. What are your thoughts on how to accomplish this? |
I did some experiments with child processes. https://github.com/heikki/spawn-task-experiment Running self contained tasks in child processes is surprisingly fast. Test task takes all the js files from node_modules, concats them and creates sourcemaps. Two normal tasks run parallel:
Two spawned tasks run parallel:
Has anyone else explored this stuff? --edit Code looks too simple and results too good. Please point out any flaws 😸 |
Ping @noahgrant ^ |
@heikki Try using worker pools instead of spinning a new one up each time you run the task. Startup times on new child_processes can be like 30ms, using a pool eliminates that |
I tried that already but decided to show simpler way first. Leaving child processes alive have a side effect that parent doesn't exit. So maybe use worker pool only when running watch. --edit Exit problem happened in different setup where messaging was done via ipc channel. If child's ipc channel was left open then the parent didn't exit. I didn't find a way to reopen it after closing so it was a dead end. |
Added worker pool example using worker-farm. |
@heikki Is there a module yet for just wrapping a function in a child_process pool? If not you should publish one, and we can play around with just var makeItFaster = require('your-module');
gulp.task('whatever', makeItFaster(function(){
})); before thinking about adding anything to core |
Published with the same name -> spawn-task-experiment |
@heikki Cool stuff - I'll let people play around with it for a bit and you can build off their feedback 👍 |
Implement support for multiple processes in the sigh asset pipeline and wrote a cool library for making dealing with process pools easy using a promise based API. |
Synthetic example of paralleling dependent tasks with worker-farm: |
+1 |
Is anyone using any of these solutions in their gulpfiles yet? Curious what people have working so far |
any progress on making this this default behavior? one gulp's biggest benefits is making parallel task execution the default, and it would be nice to run with this idea to make gulp even faster! |
@bcherny no one is working on this for gulp core. Please notice the "probably user land" label. |
I agree this should be implemented in userland, but I also think this should be implemented! |
what's the rationale for keeping this out of gulp-core? gulp is responsible for executing tasks in parallel, and it makes assumptions about how exactly to do that (in a single process, on a single core). why would it do it that way by default, rather than on multiple threads by default? |
@bcherny in addition to my comment in #1308 (comment):
Again, do it by hand or in userland |
thanks for the explanation! it's a good argument. i've seen at minimum 20-30s build for any large app i've worked on, so time to spool up a new thread is negligible for my use case. i bump into this problem often, and am surprised that there are so few existing solutions. |
@bcherny Just to be clear: we did not make a choice to run things in a single thread. This is how node and javascript works. It is a constraint of the language and environment, not a choice that we made. |
@contra not sure i understand. node offers a few apis to help orchestrate async tasks. among them are promises and setTimeout, but also child_process. i'm not sure that any one of those is the way "javascript works". |
@bcherny promises and setTimeout are still running on the same thread. I'm saying this: in JS all work happens on the main thread, gulp is a javascript library, therefor gulp has one thread. Your earlier posts made it seem like I made a choice that gulp should only be one thread. Just trying to clear it up for any future people who look at this post so nobody gets confused about where the limitation is from. |
this is a technicality, but to rephrase: node offers a bunch of apis for dealing with async tasks. among them are apis that run on a single thread, and apis that run on multiple threads. javascript itself runs on a single thread, but both node and the browser offer async apis that work in multi-thread contexts. gulp uses async apis to orchestrate async tasks. specifically, it uses orchestrator, which uses callbacks, promises and streams. the latter 2 are async apis that happen to run in a single thread. orchestrator could just as well have used child_process. node is not biased toward promises, streams, or child_process.exec. they are all just node apis, and orchestrator made the choice to use only those apis that run in a single thread. |
@bcherny The only API that node provides to solve this problem is child_process, which is not a lightweight thread - it is a full node processes running completely separate. No, orchestrator (which we don't use anymore btw) could not have "just as well have used child_process":
Running every function in a child_process would be a horrible default - the idea that gulp is idiomatic javascript would go out the window since it requires people to write code fundamentally different than they normally do. tl;dr Running every task in a child_process is never going to happen in core since child_process is not the right abstraction (reasons listed above) - if you understand the tradeoffs and want to run your task in a child_process, use this tiny module https://github.com/heikki/spawn-task-experiment This would have all been solved if node had finished the Isolate API (essentially lightweight threads w/ shared V8 memory) but they abandoned it in 0.9.x |
I'm going to close this since I think we've reached a solution (#317 (comment)) If node ever revisits the Isolate API or somebody makes a lightweight threading module that supports shared requires I will reopen this as a possibility, until then it will have to stay in userland. |
Usually more if you have a lot of heavy Support for workers (lightweight processes backed by OS threads) might land in Node.js 5. That would be a better way to deal with it. nodejs/node#2133 |
I have try to use cluster in gulp-ctool-browserify but it don't make it faster. it seems limit by memory and cpu |
@inikulin, thanks for that plugin, works like a charm! Using it now in a massive project, which brought the total build time down from 3 minutes to 2 minutes by running jade, sass and coffee compilation as parallel tasks (with gulp-ll obviously). Currently my Jade is taking the longest at 2 minutes, Splitting this into jade / jade1 / jade2 tasks with each their own blob brought the total build down to 50 seconds... (we're coming from 5m30s total with a previous setup, so that's just awesome). But that's a "living" config of blobs which is not very maintainable. Plus I use UUID generators in my Jade mixins, which now, obviously are not guaranteed to be unique anymore, because these parallel tasks do not know of one another. Splitting tasks into workers is easy peasy using gulp-ll, but splitting a single task into multiple processes, haven't found a solution for that yet... Unless someone has any pointers for that? I'm guessing spawning child processes inside a single task will pretty much be the only way? Currently the build uses 3 of 8 cores (jade takes one, the others take one physical core + one virtual), there is still so much untapped potential. |
@inikulin
|
Is it possible to do a multithread on one task like uglify JS( so the same pipe but in parallel to realize this uglify ) without only one core at 100% ? |
Nobody seems to have mentioned: https://www.npmjs.com/package/gulp-multi-process We're using it to parallelize the 4 separate build tasks using webpack, typescript, babel and uglify. Works great. |
@vangorra Seeing how |
It would be also great if a stream (of vinyl files) could be distributed over a process with |
@strarsis I'm not exactly sure if this is what you asked for but.. I worked on a The project is mostly finished but I ended up personally using https://github.com/NitorCreations/vinyl-parallel/blob/master/example/gulpfile.js |
As tasks are run concurrently it would be useful if gulp could use mutliple cores/processors to speed up large builds. This could potentially be done using the cluster api.
The text was updated successfully, but these errors were encountered: