Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async.queue performance suffers greatly with large queues #756

Closed
bbrowning opened this issue May 18, 2015 · 14 comments
Closed

async.queue performance suffers greatly with large queues #756

bbrowning opened this issue May 18, 2015 · 14 comments

Comments

@bbrowning
Copy link

While performance testing a library that uses async.queue, I noticed that the performance of async.queue degrades massively as the number of items in the queue increases. After a certain point, V8 spends so much time shuffling memory around that the performance of everything else slows to a crawl.

The culprit appears to be how the current implementation calls shift on the tasks array every time an item is processed. I prototyped an alternate implementation that maintains an index into the array and performance is vastly improved with larger queue sizes. For the purposes of this testing, the entire async.queue implementation was copied and modified to form a new async.indexedQueue function so side-by-side comparison could be easily done.

https://gist.github.com/bbrowning/b7355b731eb025608228

@aearly
Copy link
Collaborator

aearly commented May 19, 2015

Thanks for the sleuthing! Looks like async.queue is accidentally quadratic, at least in V8. There are a few PR's related to the queue that we're waiting for @caolan to merge, then we can take another look at this.

@aearly
Copy link
Collaborator

aearly commented May 19, 2015

Also, out of curiosity, what version of node are you running on?

@bbrowning
Copy link
Author

I'm running node 0.12.0. I did a bit more digging after filing this issue, and it looks like it's a common problem that happens in V8 when using an array as a queue and the array size grows large enough to be bumped in V8's large object space memory pool. The last two answers on http://stackoverflow.com/questions/27341352/why-does-a-a-nodejs-array-shift-push-loop-run-1000x-slower-above-array-length-87 do a good job of explaining what's happening.

There are libraries built specifically to address the performance problems with using an array as a queue, such as https://github.com/petkaantonov/deque. They would give higher performance than my simple indexedQueue implementation, but not sure of the effort required to get a library like this working in all JS environments async runs under.

@aearly
Copy link
Collaborator

aearly commented May 20, 2015

I recently added some benchmark scripts (some inspired by your gist, thanks!). This is a tricky problem, since we're working around v8's internal representation.

petkaantonov/deque looks like a nice solution, but for now it's easier to support browsers and non-npm package managers if we don't have any dependencies. deque also doesn't support splicing or inserting, which means it can't be used for priorityQueue.

Your indexedQueue implentation also seems like a good, simple solution. The problem with it I see is that by keeping tasks in the array after they have been processed, you run the risk of memory leaks. Periodically splicing the array is a workaround, but when exactly to splice is an open question (and depends on how large the objects you're pushing to the queue are).

@smartmouse
Copy link

May I ask if this problem is specific to the v0.12 version? and not 10.x?
Thanks

@aearly
Copy link
Collaborator

aearly commented May 20, 2015

It happens with all node versions. Node 0.10.x is actually about 6 times slower than 0.12.x in the worst case.

Note that this only happens when your queue's length is above around 120,000.

@aearly aearly added the queue label May 21, 2015
@josearaujodev
Copy link

I'm running on some performance issues when using async.queue too. I'll try to adjust the queue concurrency value to see if I can get some improvements (at the time I'm running 100 workers concurrently); The node event loop doesn't actually stop, but slows down.

@mikestopcontinues
Copy link

Issues here too. Any updates?

@aearly
Copy link
Collaborator

aearly commented Jun 11, 2016

No updates on this -- just the standard limitations of an Arrays in v8. Are you really pushing 120,000 items on to the queue?

@mikestopcontinues
Copy link

mikestopcontinues commented Jun 12, 2016

No, actually. For me, the breaking point was at about 2000 objects. I ended up optimizing my traversal algorithm to compensate. I saw another implementation that uses a linked list instead of an array. Maybe its a technique worth considering?

https://github.com/arturoarevalo/async-work-queue

@aearly
Copy link
Collaborator

aearly commented Jun 12, 2016

We have to use a data structure for the queue that supports inserting items at arbitrary positions. This is so we can share logic with priorityQueue. A deque or a linked list would help for basic queues, but would really complicate the priorityQueue.

@mikestopcontinues
Copy link

You'd need a custom splice function, but I don't think it would be that bad. Array.splice has O(n) performance, the same as linked list insertion. And you'd get O(1) for shift/unshift as a side effect.

@megawac
Copy link
Collaborator

megawac commented Jun 29, 2016

@aearly upon investigation a linked list is probably a better solution than our current approach. The only real downside to this approach is we won't be able to do binary searches, but seeing splice & unshift are O(n) operations anyway, that shouldn't be a big deal

@megawac megawac added this to the 2.0 milestone Jun 29, 2016
@megawac
Copy link
Collaborator

megawac commented Jun 29, 2016

Adding this to the 2.0 milestone, we either fix or close this by then or move it to v3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants