Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

path: refactor for performance and consistency #9289

Closed
wants to merge 1 commit into from

Conversation

nwoltman
Copy link

Improve performance by:

  • Not leaking the arguments object! (in win32.join)
  • Getting the last character of a string by index, instead of with .substr() (here) or .slice() (here)

Improve code consistency by:

  • Using [] instead of .charAt() where possible (such as here)
  • Using a function declaration instead of a var declaration (here)
  • Using .slice() with clearer arguments (such as here)

Improve both by:

  • Making the reusable trimArray() function
  • Standardizing getting certain path statistics with the new win32StatPath() function

Benchmarks: (higher is better)

function (path.) current (ops/sec) this PR (ops/sec)
win32.resolve 315,019 399,543
win32.normalize 873,559 1,051,170
win32.isAbsolute 1,915,332 1,939,849
win32.join 452,879 815,033
win32.relative 153,821 219,928
win32.format 13,118,588 16,364,384
posix.relative 244,638 304,456
Benchmark code (gist)

@jasnell
Copy link
Member

jasnell commented Mar 4, 2015

@woollybogger ... do you have benchmark numbers showing that this provides a net performance improvement? The nature of the changes would indicate an improvement but numbers always help :-).

@nwoltman nwoltman force-pushed the refactor-path branch 2 times, most recently from bbe4a0c to 1b25d91 Compare March 4, 2015 08:09
@nwoltman
Copy link
Author

nwoltman commented Mar 4, 2015

@jasnell I updated the PR description with benchmark data. I only included the functions that had a >1% change in performance.

@nwoltman
Copy link
Author

It's been over a week so...bump?

@jasnell
Copy link
Member

jasnell commented Mar 12, 2015

@woollybogger ... sorry, was catching up on a few other items. Thanks for the profiling numbers. I'll queue this up to review on Monday.

// cc @joyent/node-coreteam ... can one of you take a look as well?

@jasnell jasnell self-assigned this Mar 12, 2015
@trevnorris
Copy link

Generally LGTM. But let's run this on the servers and make sure all tests pass on all platforms.

var lastIndex = arr.length - 1;
var start = 0;
for (; start <= lastIndex; start++) {
if (arr[start]) break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move break to the next line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I forgot that's the coding style.

isUnc = device && device.charAt(1) !== ':',
isAbsolute = win32.isAbsolute(path),
isUnc = device && device[1] !== ':',
isAbsolute = isUnc || result[2],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan on inlining the call to win32.isAbsolute here and in win32.normalize. I'm concerned that we could get to a point where win32.isAbsolute and these inline versions diverge.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have another function similar to win32SplitPath that returns the following properties:

  • device
  • isUnc
  • isAbsolute
  • tail

Also, is there any reason why win32SplitPath returns an array? I feel like the code would be more readable if it returned an object instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'm guessing the answer to my question ^ is so that win32SplitPath returns the same data structure as posixSplitPath.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@misterdjules Are things good enough inline or should there be a helper function such as the one I outlined above? I tested Node with the changes in the PR against a version with the helper function and it does slow things down a little (about 2% to 5% slower for the functions the helper touches).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless win32.isAbsolute is on the critical path of a very common use case in terms of performance, I have a preference for using a helper function such as the one you described in your comment above.

I want to avoid subtle regressions due to potential divergence between several versions of inlined win32.isAbsolute.

@misterdjules
Copy link

@woollybogger @trevnorris @jasnell Added a comment inline. Thoughts?

@piscisaureus
Copy link

I know the damage has already been done, but one of the objections I have against exposing the unix code on windows and vice versa is that path.resolve() and maybe other functions produce garbage when relative paths are used.

Does anyone have an idea for fixing this. Maybe throw if the path is relative?

@misterdjules
Copy link

@piscisaureus I'm not sure I understand what you're saying, could you please give us an example of what you're describing?

@piscisaureus
Copy link

@misterdjules

> path.posix.resolve('foo/bar')
'C:\\Users\\Bert Belder/foo/bar'

I don't know what would happen if you did the opposite on unix, e.g.

> path.win32.resolve('\\\\myserver\\myshare\\foo.txt');
???
> path.win32.resolve('c:appdata\\node.js');
???

But I am sure the results would not make sense.

@misterdjules
Copy link

@piscisaureus OK thank you for the clarification!

> path.win32.resolve('\\\\myserver\\myshare\\foo.txt');
???
> path.win32.resolve('c:appdata\\node.js');
???

work as expected because they are absolute paths, but like you said path.win32.resolve with relative paths would fail on any UNIX system.

It seems that only path.resolve would be affected by this, as it seems to be the only one calling a platform-specific function (process.cwd). path.relative uses path.resolve, but it doesn't seem to be affected by it since it seems to deal only with relative paths. We would need to investigate more though.

Anyway, I think this should be handled outside of this PR and in my opinion it's worth creating a new issue. Do you want to do that @piscisaureus or should I do it?

Thank you again!

@piscisaureus
Copy link

@misterdjules

Ah, oops, you're right.
(The second path c:appdata\\node.js isn't absolute btw). It does a drive-specific cwd lookup here.

Agreed btw that I should be handled in a separate issue.

@misterdjules
Copy link

@piscisaureus created new issue here: #9430.

@nwoltman nwoltman force-pushed the refactor-path branch 2 times, most recently from 7abada1 to b770526 Compare March 19, 2015 08:26
@nwoltman
Copy link
Author

@misterdjules @jasnell Updated with the helper function as discussed here and updated the benchmark data.

if (dir) {
if (dir[dir.length - 1] === win32.sep) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change (moving the check inside if (dir)) fixes a bug where sometime dir would be undefined and thus the code would throw? If so, then it would be great to add at least a test for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can think of at least two test cases that this fixes. I'll add some tests tonight.

@misterdjules
Copy link

Could you please take a look at my comment? Otherwise, LGTM. Thanks @woollybogger! 👍

Adding to the 0.12.2 milestone as node v0.12.1 will be a security release following the recent OpenSSL releases.

@misterdjules misterdjules added this to the 0.12.2 milestone Mar 19, 2015
@nwoltman
Copy link
Author

@roryrjb You wrote the path.format and path.parse functions so I have a question for you. In path.format, root isn't used anywhere except to throw an error. Is there a good reason for this or can that check just be removed?

@roryrjb
Copy link

roryrjb commented Mar 20, 2015

@woollybogger no, there's no reason I can think of, it can be removed.

@nwoltman
Copy link
Author

@roryrjb Cool, thanks! 👍

@nwoltman
Copy link
Author

@misterdjules Ran into a little snag when writing tests:

path.posix.format({root:'', dir: 'dir/other dir/', base: 'index.html'});
// -> 'dir/other dir//index.html'
path.win32.format({root:'', dir: 'dir\\other dir\\', base: 'index.html'});
// -> 'dir\\other dir\\index.html'

The posix version doubles the directory separator while the win32 version does not.
I know 0.12 is a stable branch, so even though I'd like to fix the posix version to work the same as it's win32 counterpart, I'm guessing that won't be accepted in this brach? Or is it small enough that this would be okay?
Update: Or maybe the win32 version should also duplicate the directory separator since the input dir property shouldn't end with a directory separator? Plus that way:

path.format(path.parse(input)) === input; // true

for all win32 input (because that is already true for posix).

@misterdjules
Copy link

@woollybogger If we want to make changes to path.format, let's do that on master and in another issue/PR. The documentation in v0.12 doesn't mention a specific behavior for any platform, and it's not clear how many users currently rely on that behavior.

Thank you very much again!

@nwoltman
Copy link
Author

@misterdjules Added a new commit with the tests and a couple other changes related to path.parse and path.format. If it's good I'll squash the commits.

@misterdjules
Copy link

Thank you again @woollybogger. Added some comments, please let me know what you think.

For some of the more controversial changes, we can always tackle them later in separate PRs. That way we can land the improvements you already made to the path's module performance and take more time to discuss the rest.

Improve performance by:
+ Not leaking the `arguments` object!
+ Getting the last character of a string by index, instead of
  with `.substr()` or `.slice()`

Improve code consistency by:
+ Using `[]` instead of `.charAt()` where possible
+ Using a function declaration instead of a var declaration
+ Using `.slice()` with clearer arguments
+ Checking if `dir` is truthy in `win32.format`
  (added tests for this)

Improve both by:
+ Making the reusable `trimArray()` function
+ Standardizing getting certain path statistics with
  the new `win32StatPath()` function
@nwoltman
Copy link
Author

@misterdjules I removed the controversial code and squashed into one commit.

@misterdjules
Copy link

@woollybogger LGTM, thank you very much! 👍

misterdjules pushed a commit that referenced this pull request Mar 25, 2015
Improve performance by:
+ Not leaking the `arguments` object!
+ Getting the last character of a string by index, instead of
  with `.substr()` or `.slice()`

Improve code consistency by:
+ Using `[]` instead of `.charAt()` where possible
+ Using a function declaration instead of a var declaration
+ Using `.slice()` with clearer arguments
+ Checking if `dir` is truthy in `win32.format`
  (added tests for this)

Improve both by:
+ Making the reusable `trimArray()` function
+ Standardizing getting certain path statistics with
  the new `win32StatPath()` function

Reviewed-By: Julien Gilli <julien.gilli@joyent.com>
PR-URL: #9289
@misterdjules
Copy link

Landed in c66f8c2.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants