-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronous is evil? #1
Comments
For disk IO, usually synchronous is faster. There are some rare edge cases (cold cache due to low RAM with spinning magnet disk) where async is faster, but I don't care about these. So it's intentionally synchronous-only. |
@joliss I wonder if node actually handles the async story better, although the latency might be higher, I would suspect the native concurrency would enable higher throughput. I suspect as N grows, it may be more appealing. eg. sync walk might be faster when N=1, but when N=10 or N=100 I would suspect allowing node to handle the concurrency natively would be faster. This is all my own assumptions, based on my understanding of node and the heroics it goes through to simulate sync. |
I would have thought so too, but it seems that unless you're actually blocking on a slow disk or network bus, sync is faster. For instance, globbing 100k files with node-glob, with warm cache, takes 10 seconds sync vs. ~20 seconds async, and the async timing has quite a bit of variance. I'm guessing that all the bookkeeping for the parallel requests slows it down. |
crazy |
AAAAND: The results are out. Sync: 60-63ms Both had to examine my desktop which currently has clones of the emberjs and app kit repos lying on it. I guess async clearly wins :) var fs = require('fs');
var RSVP = require('rsvp');
var stat = RSVP.denodeify(fs.stat);
var readdir = RSVP.denodeify(fs.readdir);
function walk(baseDir, relativePath) {
return stat(baseDir + '/' + relativePath)
.then(function(stats) {
if (stats.isDirectory()) {
return readdir(baseDir + '/' + relativePath)
.then(function (entries) {
return RSVP.all(entries.map(function(entry) {
return walk(baseDir, relativePath + entry + '/')
}))
.then(function(entries) {
return Array.prototype.concat.apply([relativePath], entries)
});
});
} else {
return [relativePath];
}
});
}; |
That implementation errors. I have a benchmark script; if you give me a functioning implementation I'll run it. |
https://github.com/MajorBreakfast/walk-as-promised @stefanpenner now also takes a look at it to see where the bottlenecks are.
|
On Linux, I'm getting
If I create 100k files, I get:
So walk-sync seems to be faster by an order of magnitude. Are you getting something different? |
On windows async is faster For 45909 files:
|
On Max OS Mountain Lion (Not the same machine) For 58701 files:
|
in theory hiding that behind an abstraction would enable "async" for windows" and "sync for "linux". I going to try and investigate some this evening. |
Well, just don't send me a PR for that. ;-) I'm not going to be able to maintain custom code for Windows; certainly not for performance optimizations.
Let me know what you find out if you get around to it. It seems interesting that sync is slower on Windows. |
Okay new results: Mac
Windows
The flatten afterwards versions depend on lodash. "Callbacks" means that it doesn't use promises internally, it returns a promise, though. Btw. the additional entry is '/' in case you're wondering why the async versions' file count is +1. |
I updated the benchmark again. From what I see the results are:
Depends on how it performs on linux, but you could always abstract the actual implemenation away if you decide to use both: // walk.js
var Promise = require('rsvp').Promise;
if (/^win/.test(process.platform)) {
module.exports = require('walk-as-promised')
} else {
module.exports = function() { return Promise.resolve(require('walk-sync').apply(null, arguments)) }
} |
I wondered all the time why you first read in all the folders and then stat every file again to extract the things you then hash. Well, now it's clear: You don't use node-walk-sync itself in broccoli, just something similar. |
Very interesting. I'll have to investigate. Out of bandwidth at the moment, but will get back to this. Thanks @MajorBreakfast for your benchmark repos! |
Perhaps has to do with slow hard drive? I noticed walk sync uses lstatsync to stat each file sequentially. If the disk is slow, it can potentially make a difference. If the stat is async, you leave it to the disk to optimize batch read. |
Are you using HDD or SSD for the benchmarks? The way that SSDs access data is fundamentally different from how HDDs access data.
Following that reasoning, it would make sense that read performance would be better on a SSD when handled in a synchronous manner. |
@evanplaice I did some splunking when @MajorBreakfast pointed out the unix/windows difference. If memory serves it seemed to boil down to how libuv handles async and sync on each platform. On win the async path is merely more optimal. When comparing, I recall the unix async path looked like it could be further optimized. This was quite some time ago and the state of libuv may have changed |
@joliss I think we can close this. |
What about going asynchronous using promises and stating multiple files at once?
The text was updated successfully, but these errors were encountered: