Option to improve perf by avoiding fs.stat calls #24

es128 · 2015-04-20T21:01:33Z

readdirp can take a long time to process a very large file tree. I'm interested in exploring ways to improve performance for this use case, and one way to do that would be by avoiding the stat calls to every entry.

Instead, it should be more performant to just call readdir on each entry and handle the error to tell the difference between files and dirs - perhaps even making the assumption that entries with names that appear to include a file extension are not directories when this mode is employed.

Opening this issue just to declare my intent to do some work along these lines.

@thlorenz if you have objections to adding this sort of option to readdirp please let me know and I'll consider other possibilities.

The text was updated successfully, but these errors were encountered:

thlorenz · 2015-04-21T16:13:08Z

No, provide an option, add tests, etc. and I'll happily merge this.
However the convenience of resolving stats is probably the major thing that readdirp adds on top of the glob package aside from the streaming.
Any reason you wouldn't just use glob for these cases?

es128 · 2015-04-21T17:40:58Z

I hadn't considered glob before, but still the reason I'd want this here is to keep the same interface I'm using in chokidar (using the streaming api) with a simple option to switch the stats on or off, which would correspond with chokidar's alwaysStat option. I utilize readdirp's filters and depth options that I wouldn't want to have to reinvent by incorporating glob (would rather reinvent some of readdirp's internals I guess, heh)

Thank you for the suggestion, though!

thlorenz · 2015-04-21T18:22:10Z

Ok, cool waiting for the PR ... :)

es128 · 2015-04-21T20:20:32Z

Just FYI, node-glob does do stat calls on the paths it collects, it just doesn't expose that data. So I doubt I'd see much of a performance improvement by using it for the cases I'm targeting.

thlorenz · 2015-04-21T21:37:53Z

Interesting, I had no clue about that. I guess it does so to figure out links and stuff.

So readdirp could become faster than glob then -- that'd be awesome!
We need benchmarks!

thlorenz · 2015-04-21T21:39:29Z

Oh, but btw if we don't do stat calls how are we gonna know if we need to recurse into (a directory) or not (a file)?

thlorenz · 2015-04-21T21:41:07Z

Ah just reread:

making the assumption that entries with names that appear to include a file extension are not directories when this mode is employed

That's really hairy since lots of files have no extension -- especially bash scripts. So you'd end up catching a bunch of errors but you still may be faster -- we'll see.

es128 · 2015-04-21T23:09:07Z

Yes, expecting that. The idea is that catching the error on a bad readdir call is faster than stat.

thlorenz closed this as completed Apr 21, 2015

es128 mentioned this issue Jul 1, 2015

Memory Issues paulmillr/chokidar#307

Closed

es128 mentioned this issue Dec 10, 2015

minimize fs calls kmalakoff/walk-filtered#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to improve perf by avoiding fs.stat calls #24

Option to improve perf by avoiding fs.stat calls #24

es128 commented Apr 20, 2015

thlorenz commented Apr 21, 2015

es128 commented Apr 21, 2015

thlorenz commented Apr 21, 2015

es128 commented Apr 21, 2015

thlorenz commented Apr 21, 2015

thlorenz commented Apr 21, 2015

thlorenz commented Apr 21, 2015

es128 commented Apr 21, 2015

Option to improve perf by avoiding fs.stat calls #24

Option to improve perf by avoiding fs.stat calls #24

Comments

es128 commented Apr 20, 2015

thlorenz commented Apr 21, 2015

es128 commented Apr 21, 2015

thlorenz commented Apr 21, 2015

es128 commented Apr 21, 2015

thlorenz commented Apr 21, 2015

thlorenz commented Apr 21, 2015

thlorenz commented Apr 21, 2015

es128 commented Apr 21, 2015