-
Notifications
You must be signed in to change notification settings - Fork 11
Promise Performance Meeting #31
Comments
I do not think we should care for the bluebird benchmark at all, mostly because of this: petkaantonov/bluebird#887 (comment). More importantly, I think the best approach is to focus on async/await benchmarking and performance. This is what people will use, and one of the main drivers for adopting native promises in modules and applications. I'm +1 in joining, typically CEST business hours works well for me, but I can also join later in the evening. |
Actually, the benchmarks predate bluebird (so it's more accurate to say that Bluebird is built to make the benchmarks shine). Bluebird is not at the best maintenance state after 4 years but it is the benchmark that V8 is using internally for promise performance. I didn't suggest them because they are terrific. We can also extract the benchmark from https://github.com/petkaantonov/bluebird/tree/master/benchmark/doxbee-sequential or approach @spion directly about the suite - I assume V8 did the former. I'm also +1 on writing better applicative promise benchmarks that simulate real world loads. I think we'll discover quickly that our bottleneck is still I think V8 can definitely win vs. bluebird with TurboFan inlining promise calls and async/await based optimizations (I'm sure TurboFan, unlike Crankshaft is fully capable of that level of optimization after reading the optimization code). Moreover, the V8 team are willing to listen to Node.js performance concerns now (which is awesome). I think async/await can win against callbacks (as a long term goal) but that's a farther point. |
I'm ok if we move them/extract them to another repo, the fact that they currently live within bluebird makes them not fair.
Definitely 👍 .
But that's it not spec-compliant, right?
I can confirm this. It's very impressive and it took me some time to understand why some async/await code was faster than the equivalent callback-based. |
Well that's a good point. Today our entire Node.js platform is using callbacks, for people using a promisified Node.js core (with util.promisify) the bottleneck of Once we deliver a promisified core API a faster
Sure it is, I'll give an example:
In this case, bluebird sees that By comparison Node or V8 makes no such assumptions and always defers Scheduling everything on nextTick can have measurable and perhaps even considerable overhead. Although I think promisification is what dominates the performance charactaristics of benchmarks that make hundreds of thousands of calls. I'd like promises to be a 0 overhead abstraction in Node, Bluebird is close - but I think native promises can win here and I'm rooting for them. With the help of Benedikt I think that's a very reasonable goal. Other than creation of promises and a few optimization tricks - I think V8 promises can be very competitive. I'm looking for additional pain points to optimize for before the meeting - any input (from you in particular) would be very useful here. |
Has anyone tried to benchmark the current implementation of |
I agree.
Can V8 implement the same behavior? |
@targos I'm unfamiliar with that approach - if @addaleax chimes in she might have done it. I benchmarked With
And without
One nice side effect is that promises beat caolan/async with Node at the moment. Those benchmarks are on older Node. I'll run them again and compare (If you create a fork with a promisify based on another approach I can check that out). Will run the tests with Node 8.4 and post the results here.
I don't see why not, they control the promise constructor and the By the way, how do you feel about forking Bluebird under the Node org (possibly in nodejs/promises), removing the excess code and making a benchmark here where we can maintain it? I hope you realize Petka didn't close that PR because he was doing anything fishy - he is just a very busy individual without a lot of time to maintain the none-critical parts. We're all Node collaborators (me you and Petka) with the same goal (faster core promises). |
I created two versions of native promises (one with
I'm trying to figure out what's the best place to upload said benchmark. It simulates doing several IO operations and committing to a database. Going to check out Same hardware - running the latest canary:
Looks like everything got faster, but native promisify is actually faster than As a comparison, here is 8.1 on the same hardware (still running TF on the async/await one):
|
Numbers on Windows:
Source: https://github.com/kyrylkov/promise-async-performance |
@kyrylkov thanks! Given I did not share the code the fact we got similar results is reassuring. For comparison and until we decide where to put it - here is the patch I applied https://gist.github.com/benjamingr/a7c7cd7ff2ee69d3a71f0058d286ff9a#file-1-diff-L617-L675 |
@benjamingr you can try with this branch: https://github.com/targos/node/commits/expose-v8-extras I checked with a very simplistic benchmark and the V8 extras seem to bring a non-negligible improvement: 'use strict';
function asyncF(cb) {
cb();
}
const {promisify} = require('util');
const promiseF = promisify(asyncF);
(async function() {
for (var i = 0; i < 1e6; i++) await promiseF();
console.time('test');
for (var i = 0; i < 1e6; i++) await promiseF();
console.timeEnd('test');
})(); Results:
|
Thanks @targos, that's very interesting. Running those on the real (doxbee) benchmark after building Node from your source at that branch I get:
Looks like on V8 6.0 native promisify from v8-extras wins (PR welcome). Here are the results from your branch merged into node-v8 canary (so on top of v8 6.2):
@kyrylkov thanks! That's very useful, it's also interesting to benchmark bluebird promises with native async/await (interesting result, faster than native async/await) |
@benjamingr not sure I'm following. My branch is on V8 6.0, not 6.2. |
@targos updated the above with benchmarks both with V8 6.0 and 6.2 (6.2 by doing a git pull from your branch on top of node-v8's canary branch). |
regarding async iterators, I would like to get the best performant implementation for streams. Maybe we should talk about that as well. |
I'm happy to attend the meeting as well. I'm in PST though. Leaving some inline comments if I can't make it --
@matthewloring did some preliminary micro benchmarking and found looping over
util.promisify is implemented using the C++ API which is probably causing most of this overhead. The best way to address this would be for V8 to implement util.promisify using our assembler and expose it using v8-extras.
The biggest perf cliff is modifying the promise prototype or a promise instance. This would cause all our fast paths to fail since it's guarded by a map check. (This is true for most builtins in V8).
The general policy is to implement stage 3 proposals, but we don't have a specific timeline for each proposal. I'm currently working on updating V8's implementation of promise#finally to match the recent spec changes. @caitp has been working hard on async iterators; will defer to them about it's status. There are some more optimizations possible for native promises both for perf and memory (such as https://bugs.chromium.org/p/v8/issues/detail?id=6468). But I agree with @mcollina that it might be better to focus on async await and optimize promises as part of that process. As an aside, native promises have to spec compatible which requires doing additional work (like looking up the species constructor and maintaining strict ordering in the microtaskq) that polyfills can get away it. Here's another microbenchmark by wikipedia that doesn't use util.promisify (but suffers from other problems of microbenchmarking) - https://github.com/wikimedia/web-stream-util/pull/3 |
I'm not going to be attending a meeting, but here's a quick update on how performance looks right now:
So, async generators allocate a lot of objects during their runtime. In many cases, these don't migrate to old space, but it's a stress on the GC to track them. Closures may be possible to eliminate with special handlers in the Other than those issues, async generators share a lot of the same issues that native Promises share. these are not based on measurements/benchmarks of any kind, as I haven't found any suitable benchmarks or applications which make use of async generators just yet. A real application which uses them would help a lot to identify problems --- All of the performance work so far has really been based on reducing generated code and snapshot size, and not on actual application performance |
cc @psmarshall |
Thanks for chiming in! I think a nice design goal would be to optimize the 99% use case of using native promises only listened to by one observer in an
That would be amazing, a faster
In my experience this is not very common, and I think we should (at least initially) aspire to fast promises with
Thanks, I'll take a look and compare it with doxbee and see if I can run it on the scenarios above.
I agree. How do you feel about the optimization discussed in this comment (after "But that's it not spec-compliant, right?") - I think it could be a nice performance win with measurable impact on async/await.
I'm not sure what's hard about maintaining strict ordering in the microtask queue - in Node I think we queue our own microtasks and I don't recall it being a burden. At bluebird we schedule on By the way, looking at these results do you @caitp or @littledan happen to know why Note that as an interesting (perhaps) observation - we can either inline a promise's value because it is available when it is awaited - or we don't have to defer the async function when resolving it. I'm wondering if we can consider a fast path for native promises in async functions that might be able to improve the results. @caitp thanks for that, some observations I would love feedback on:
I'd assume in 99.9% of real world cases there is no queue here, you are only waiting for one value at a time, and I'd assume in 99% of cases that would be inside a In the case we're Note, I am specifically referring to the case where an async iterator is consumed by a for...await loop and the yielded values are not exposed anywhere else. I'm only talking about the promises |
In the case where there's only one request, this is essentially Label storeFirstRequest;
Node firstRequest = Load(generator, offsetof(GeneratorObject, requestQueue));
JumpIfUndefined(firstRequest, &storeFirstRequest);
// ... Loop to find the last request and store ....
storeFirstRequest.Bind();
Store(generator, offsetof(GeneratorObject, requestQueue), request); I don't think this is too bad. You could shrink(? might be optimistic) code a bit by moving the slow case into a separate stub or something, but this is likely worse than the inlined loop. Using an array to represent the queue would be similar: you'd have to load the field, test if it's undefined, test if it's a FixedArray, code to handle those cases and the third case --- or always store in a FixedArray even for single requests, in which case you have additional loads/stores and length tracking + realloc code. Once we have an opportunity to actually measure the best option, we can make more informed choices. But the singly linked list seems like the best choice for the common case, in terms of instructions used.
So as I said, Await allocates 2x closures and 2x Promises. In the current design, I think we could easily eliminate the 2x closures by adding new private symbols indicating a common behaviour, which PromiseHandle would know about. Eliminating the Promise allocations for Await would take more work, and I'm not 100% sure how it would fit together right now, since it's all very tied to the Promise implementation right now. @gsathya might have some ideas on how to do that. |
Sorry if I bother with a not so relevant nit. RE async iterators and streams. Please, consider if we can do something for |
@vsemozhetbyt I think that’s just an API question, not a perf thing. Things we could do:
|
Yes, sorry, I was also thinking about it as a use case that can be benchmarked easily for a big file, if I get this right. |
@addaleax @vsemozhetbyt see prior work at nodejs/readable-stream#254 (ccing @mcollina ) I think that avoiding the two closures in readable streams is a good start - but allocating two promises would still make this a show stopper for real node event emitters and streams. Optimally I'd like to get: for await(const item of stream) {
// work on item
} As fast as stream.on("data", item => { /* work on item */ }) In order for it to be useful for servers. Although I really think we should focus on making async/await a zero overhead abstraction and there is definitely still a lot that can be done for that. |
Very nice, it shows there are misconceptions and that async iterators are poorly communicated. For example this part:
Is incorrect. The Promises/A+ spec (which is the wrong spec for this case by the way) only requires platform code to remain before executing the next chunk. Since this is I/O dominated this happens anyway - and in particular |
Pinging @danvk to weigh in on their article :) I was thinking about holding the meeting the week after next week (I think we have enough to discuss at this point). Is that time problematic for anyone who would like to participate? |
Hi @benjamingr! I chimed in on the TC39 issue. It's been interesting to read all the responses to that article. One of my main takeaways is that, at least for my use of reading a big CSV file while starting a server, sync is just fine. I'd love to hear more about this:
I was using node 7.6.0 and TypeScript with
So looks like progress is being made. If for-await-of can be fast, then that's the best of all worlds! |
@danvk If you can run your test in a recent Node-lkgr, Chrome Canary version, using the native implementation rather than a transpiled version, I think this would give some better information. Let me know if you want help setting that up. |
FWIW, nodejs/node#15081. |
I would like to set up a hangout with the V8 team regarding promise performance. I've already talked to @bmeurer the person in charge on V8's side regarding a meeting and he has been gracious enough to agree to set up one.
by all means, if discussion here gets noisy feel free to unsubscribe - we will discuss this in the hangout
Outstanding issues I'd like to discuss
util.promisify
faster to a point where native promises win against Bluebird 3.5 in the doxbee benchmark.Outstanding issues that are interesting, but not necessarily for the meeting
Participation
In addition to nodejs/promises members - pinging @petkaantonov @nodejs/performance, @addaleax, @caitp and @misterdjules.
I'd like to brainstorm here for a few days and then set up the hangout. I'd like to limit participation to a few people - I'd love to attend to share the cliffs and bluebird experience, I'd love @addaleax and @mcollina to attend from Node's side and @bmeurer obviously :) If anyone would like to attend because they don't feel their view point will be represented please comment here with why you would like to attend.
Optimally most discussion before the meeting should happen here.
The text was updated successfully, but these errors were encountered: