Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: include more fs.stat*() optimizations #11665

Closed
wants to merge 2 commits into from

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented Mar 3, 2017

stat*() optimizations including:

  • Move async *stat() functions to FillStatsArray() now used by the sync *stat() functions

  • Avoid creating fs.Stats instances for implicit async/sync *stat() calls used in various fs functions

  • Store reference to Float64Array data on C++ side for easier/faster access, instead of passing from JS to C++ on every async/sync *stat() call

Additionally, there are some further realpath*() optimizations, including:

  • Skip URL instance check for common (string) cases

  • Avoid regexp on non-Windows platforms when parsing the root of a path

  • Skip call to getOptions() in common case where no options is passed

  • Avoid hasOwnProperty()

Some benchmark results with all of the changes in this PR:

                                                                        improvement confidence      p.value
 fs/readfile.js concurrent=1 len=1024 dur=5                                  5.50 %            0.177524964
 fs/readfile.js concurrent=1 len=16777216 dur=5                             -0.26 %            0.860522023
 fs/readfile.js concurrent=10 len=1024 dur=5                                 7.57 %         ** 0.004268331
 fs/readfile.js concurrent=10 len=16777216 dur=5                            -0.35 %            0.374914593
 fs/bench-stat.js kind="fstat" n=200000                                      3.71 %         ** 0.002509243
 fs/bench-stat.js kind="lstat" n=200000                                      3.16 %         ** 0.001381068
 fs/bench-stat.js kind="stat" n=200000                                       4.13 %          * 0.019328676
 fs/readFileSync.js n=600000                                                20.27 %        *** 2.313943e-38
 fs/bench-statSync.js kind="fstatSync" n=1000000                             2.01 %            1.066663e-01
 fs/bench-statSync.js kind="lstatSync" n=1000000                             4.42 %        *** 4.163876e-06
 fs/bench-statSync.js kind="statSync" n=1000000                              3.46 %        *** 1.333101e-06
 fs/bench-realpathSync.js type="resolved" n=300000                          86.06 %        *** 3.681945e-42
 fs/bench-realpathSync.js type="relative" n=400000                          61.71 %        *** 1.349012e-54
 fs/bench-realpath.js type="resolved" n=30000                                6.00 %        *** 3.601045e-11
 fs/bench-realpath.js type="relative" n=100000                               7.61 %        *** 3.163175e-14
 module/module-loader.js useCache="false" fullPath="false" thousands=50      5.05 %        *** 1.426385e-19
 module/module-loader.js useCache="false" fullPath="true" thousands=50       5.15 %        *** 6.171543e-16

I'm initially marking this as semver-major because of the changes in fs.readFile(), fs.readFileSync(), etc. in how they retrieve stats for files/directories. If someone monkey patches fs.*sync*(), those monkey matched methods would no longer be called from those changed functions (fs.readFile(), etc.). If we feel we shouldn't be worried about this, then I will gladly remove the label.

CI: https://ci.nodejs.org/job/node-test-pull-request/6674/

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)
  • fs

@mscdex mscdex added c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version. labels Mar 3, 2017
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system. labels Mar 3, 2017
@not-an-aardvark
Copy link
Contributor

For what it's worth, this will break some modules like mock-fs which rely on mutating process.binding('fs') and shimming everything to avoid touching the actual filesystem.

I acknowledge that process.binding('fs') is not a public API, so changes like this might be worthwhile anyway. I just want to point out that changes to the API of process.binding('fs') methods do currently have ecosystem impact.

@mscdex
Copy link
Contributor Author

mscdex commented Mar 3, 2017

@not-an-aardvark Well, in a recent PR I was told not to worry about process.binding() changes *shrug*. See this comment and this comment.

@not-an-aardvark
Copy link
Contributor

not-an-aardvark commented Mar 3, 2017

Fair enough. The reason I bring this up is that #11522 ended up causing tschaub/mock-fs#197 and breaking some builds. (I'm not trying to argue that this particular decision was justified/unjustified -- I just want to make sure that these decisions are made with full information on what they might break downstream.)

@addaleax addaleax self-requested a review March 3, 2017 09:27
Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comments.

lib/fs.js Outdated
@@ -114,6 +115,24 @@ function makeCallback(cb) {
};
}

// Special case of `makeCallback()` that is specific to async `*stat()` calls as
// an optimization, since the data passed back to the callback needs to be
// transformed anyway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing dot. Happens in quite a few other places. I won't point them out but please fix them up.

lib/fs.js Outdated
// Use stats array directly to avoid creating an fs.Stats instance just for
// our internal use
var size;
if (tryStatSync(fd, isUserFd) && ((statValues[1] & S_IFMT) === S_IFREG))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra parens?

lib/fs.js Outdated
this._handle.onchange = function(current, previous, newStatus) {
this._handle.onchange = function(newStatus) {
var current = statsFromValues();
var previous = statsFromPrevValues();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move this to after the if statement if you make it compare on the statValues entry directly.

lib/fs.js Outdated
@@ -1665,7 +1714,8 @@ fs.realpath = function realpath(p, options, callback) {
// dev/ino always return 0 on windows, so skip the check.
let id;
if (!isWindows) {
id = `${stat.dev.toString(32)}:${stat.ino.toString(32)}`;
// id === stats.dev:stats.ino
id = `${statValues[0].toString(32)}:${statValues[7].toString(32)}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some field index constants would be good to have. Using number literals obscures it too much.

Local<Value> argv[] = {
BuildStatsObject(env, curr),
BuildStatsObject(env, prev),
Integer::New(env->isolate(), status)
};
wrap->MakeCallback(env->onchange_string(), arraysize(argv), argv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this now to:

Local<Value> arg = Integer::New(env->isolate(), status);
wrap->MakeCallback(env->onchange_string(), 1, &arg);

lib/fs.js Outdated
binding.FSInitialize(fs.Stats);

fs.Stats.prototype._checkModeProperty = function(property) {
Stats.prototype._checkModeProperty = function(property) {
return ((this.mode & constants.S_IFMT) === property);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop the constants. now here.

lib/fs.js Outdated
return this._checkModeProperty(constants.S_IFDIR);
};

fs.Stats.prototype.isFile = function() {
Stats.prototype.isFile = function() {
return this._checkModeProperty(constants.S_IFREG);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

lib/fs.js Outdated
return this._checkModeProperty(constants.S_IFCHR);
};

fs.Stats.prototype.isSymbolicLink = function() {
Stats.prototype.isSymbolicLink = function() {
return this._checkModeProperty(constants.S_IFLNK);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ditto.

src/node_file.cc Outdated
ASYNC_CALL(fstat, args[1], UTF8, fd)
} else {
SYNC_CALL(fstat, 0, fd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullptr

lib/fs.js Outdated
@@ -130,7 +149,7 @@ function isFd(path) {
return (path >>> 0) === path;
}

// Static method to set the stats properties on a Stats object.
// Constructor for file stats
function Stats(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, it's worth noting that I had tried to convert this to a ES6 Class just to see what would happen and it yielded a 50% perf reduction.

Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once @bnoordhuis' comments are addressed

@mscdex mscdex force-pushed the fs-stat-perf-round2 branch 2 times, most recently from 69c14fc to 1fa23da Compare March 3, 2017 20:56
@mscdex
Copy link
Contributor Author

mscdex commented Mar 3, 2017

@mscdex mscdex added this to the 8.0.0 milestone Mar 3, 2017
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a suggestion

lib/fs.js Outdated
if (oldStatus === -1 &&
newStatus === -1 &&
current.nlink === previous.nlink) return;
statValues[2] === statValues[16]) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to leave a comment to indicate that these values refer to .nlink

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another comment I suggested adding constants for the field indices, that would help with legibility.

Copy link
Contributor Author

@mscdex mscdex Mar 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here as well now. I misunderstood the original suggestion for this line.

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a suggestion.

lib/fs.js Outdated
if (oldStatus === -1 &&
newStatus === -1 &&
current.nlink === previous.nlink) return;
statValues[2] === statValues[16]) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another comment I suggested adding constants for the field indices, that would help with legibility.

@mscdex mscdex force-pushed the fs-stat-perf-round2 branch 2 times, most recently from e072aa3 to add40be Compare March 6, 2017 01:44
@mscdex
Copy link
Contributor Author

mscdex commented Mar 6, 2017

CI again after making suggested changes: https://ci.nodejs.org/job/node-test-pull-request/6719/

@bnoordhuis
Copy link
Member

What I had in mind was more along the lines of const kInoFieldIndex = ... declarations at the top of the file and use those consistently everywhere, with a const kStatFieldCount = 14 so you can use statValues[kStatFieldCount + kInoFieldIndex] to index into the second stat values list.

@mscdex
Copy link
Contributor Author

mscdex commented Mar 6, 2017

@bnoordhuis I was trying to avoid lookups where possible as I'm not sure V8 will always inline the value, even if const.

@bnoordhuis
Copy link
Member

Valid concern. One more argument for reintroducing macros, I suppose. Alternatively, you could write them as statValues[/* ino */ 7].

@Fishrock123
Copy link
Contributor

For what it's worth, this will break some modules like mock-fs which rely on mutating process.binding('fs') and shimming everything to avoid touching the actual filesystem.

I acknowledge that process.binding('fs') is not a public API, so changes like this might be worthwhile anyway. I just want to point out that changes to the API of process.binding('fs') methods do currently have ecosystem impact.

We should warn them ahead of time still I think. @mscdex do you figure the new way is still shimmable?

@mscdex
Copy link
Contributor Author

mscdex commented Mar 6, 2017

@Fishrock123 Support for the Object::New()/BuildStatsObject() code paths for *statSync() was already removed in #11522 because it was decided we aren't supporting process.binding() changes.

@mscdex mscdex force-pushed the fs-stat-perf-round2 branch from add40be to 721a107 Compare March 11, 2017 05:11
@mscdex
Copy link
Contributor Author

mscdex commented Mar 11, 2017

Rebased and added inline comments next to the Float64Array's indices.

Any further comments/questions/suggestions @nodejs/collaborators?

@jasnell
Copy link
Member

jasnell commented Mar 15, 2017

jasnell pushed a commit that referenced this pull request Mar 15, 2017
Including:

* Move async *stat() functions to FillStatsArray() now used by the
sync *stat() functions

* Avoid creating fs.Stats instances for implicit async/sync *stat()
calls used in various fs functions

* Store reference to Float64Array data on C++ side for easier/faster
access, instead of passing from JS to C++ on every async/sync *stat()
call

PR-URL: #11665
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
jasnell pushed a commit that referenced this pull request Mar 15, 2017
Including:

* Skip URL instance check for common (string) cases

* Avoid regexp on non-Windows platforms when parsing the root of a path

* Skip call to `getOptions()` in common case where no `options` is passed

* Avoid `hasOwnProperty()`

PR-URL: #11665
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@jasnell
Copy link
Member

jasnell commented Mar 15, 2017

Landed in 6a5ab5d...7109774

@jasnell jasnell closed this Mar 15, 2017
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
Including:

* Move async *stat() functions to FillStatsArray() now used by the
sync *stat() functions

* Avoid creating fs.Stats instances for implicit async/sync *stat()
calls used in various fs functions

* Store reference to Float64Array data on C++ side for easier/faster
access, instead of passing from JS to C++ on every async/sync *stat()
call

PR-URL: nodejs#11665
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
jungx098 pushed a commit to jungx098/node that referenced this pull request Mar 21, 2017
Including:

* Skip URL instance check for common (string) cases

* Avoid regexp on non-Windows platforms when parsing the root of a path

* Skip call to `getOptions()` in common case where no `options` is passed

* Avoid `hasOwnProperty()`

PR-URL: nodejs#11665
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
fields = new double[2 * 14];
env->set_fs_stats_field_array(fields);
}
Local<ArrayBuffer> ab = ArrayBuffer::New(env->isolate(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex , Correct me if i am wrong, but ArrayBuffer::New() creates a buffer by default in externalizeMode. Who frees fields array in that case? May be you want to pass ArrayBufferCreationMode::kExternalized to this API?

Copy link
Contributor Author

@mscdex mscdex Mar 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the default mode already? fields should never be freed by V8 according to the description, which is what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but then how/when do we delete fields?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be you want to pass ArrayBufferCreationMode::kExternalized to this API?

What i meant was to pass ArrayBufferCreationMode::kInternalized instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't. It's created once and stays for the life of the process, for whenever someone calls the fs.stat*() functions implicitly or explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@jasnell jasnell mentioned this pull request Apr 4, 2017
@mscdex mscdex deleted the fs-stat-perf-round2 branch April 5, 2017 04:20
@mscdex
Copy link
Contributor Author

mscdex commented Apr 15, 2017

@nodejs/ctc How do you feel about the semver-major-ness of these changes? Do you think it is warranted? The reason I ask is that apparently in some situations certain fs.Stats fields are negative in JS land because the previous stat() functions were converting unsigned integers to signed integers (see #12419).

So either we could decide this PR really isn't semver-major and backport it as-is to v7.x which still has async stat() calls returning fs.Stats created using the old method, or I extract the async stat() changes from this PR separately and only backport that to v7.x. The downside to the latter solution is that it might add confusion due to the partial backport-ness.

Thoughts?

@addaleax
Copy link
Member

You mean, using this as a bugfix for #12419? I would be okay with that, but it’s probably also worth mentioning that #11522 made it into v6.10.1, so for being consistent we should also apply it there (or, alternatively, revert #11522 on v6.x).

@mscdex
Copy link
Contributor Author

mscdex commented Apr 15, 2017

@addaleax Ah ok I did not know it made it into v6.x.

@jasnell
Copy link
Member

jasnell commented Apr 15, 2017

If there have been no regressions I'm good with downgrading thing to semver-minor and backporting.

@mscdex
Copy link
Contributor Author

mscdex commented Apr 15, 2017

@jasnell Have enough people tested with master though since this was landed?

@jasnell
Copy link
Member

jasnell commented Apr 15, 2017

Unsure. Preferably it would sit through at least one release on current before backporting.

@mscdex
Copy link
Contributor Author

mscdex commented Apr 15, 2017

I think we'd need to make a decision sooner than that because of the issue I linked to.

@jasnell
Copy link
Member

jasnell commented Apr 15, 2017

Doh, actually nevermind lol. For some reason I was reading that as backporting to v6. I'm good with backporting to v7.

@mscdex
Copy link
Contributor Author

mscdex commented Apr 15, 2017

Well it's the same situation for v6 really, unless we want to decide differently for that branch.

@jasnell
Copy link
Member

jasnell commented Apr 15, 2017

Yep, understood. Let's get it into a v7 release. Give it a week or two, then decide for v6.

@mscdex mscdex mentioned this pull request Apr 24, 2017
3 tasks
JLHwung added a commit to JLHwung/node that referenced this pull request Nov 1, 2017
Including:

* Move async *stat() functions to FillStatsArray() now used by the
sync *stat() functions

* Avoid creating fs.Stats instances for implicit async/sync *stat()
calls used in various fs functions

* Store reference to Float64Array data on C++ side for easier/faster
access, instead of passing from JS to C++ on every async/sync *stat()
call

Backport-PR-URL: nodejs#11665
Fixes: nodejs#16496
JLHwung added a commit to JLHwung/node that referenced this pull request Nov 1, 2017
Including:

* Avoid regexp on non-Windows platforms when parsing the root of a path

Backport-PR-URL: nodejs#11665
@JLHwung JLHwung mentioned this pull request Nov 1, 2017
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.